Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Is this a big deal? 1

Status
Not open for further replies.

joelwenzel

Programmer
Jun 28, 2002
448
I have a data base that has many columns. Basically, there are 36 data columns. However, in some rows, as few as only 10 columns will contain data (the others will be null). Most of the time, only 15 to 20 of the rows will contain data. The columns are of datatype tinyint(3) so i don't think they take up a whole lot of memory. Also, the table will eventually take up millions and millions of rows.

So, should I keep the table designed with many many blank columns or should I be looking at something else?

I was thinking on other possiblity was to hve a table that contains 10 columns (for data) and then join rows in that table to ones in my main table (via a view)...but all those joins seem like they could be expensive.
 
Are (some of) these columns repeatable (a la blah1, blah2... blah15) or not?

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
I don't need to display all columns all the time because some of the columns will never show data. Actually, here, I will give you an example of what I mean.


Suppose, I am creating a system to track student scores in, say, 20 different classes. Most students take 10 classes, however, some take 5 and some take 20 (and some take numbers inbetween). My table is

Student Scores
(id's dates etc) (Class1,class2,class3,class4,...class19,class20)

Now I want to calculate some stats...eg, the average for class1, graphs, etc. Any class that is null (contains no data for a particular student) are not included in the calculation. So, I guess what I'm rally wondering is if I should somehow split this table and just use joins to put blocks of class scores together (eg 2 tables consisting of Block ID, ClassA,classB,classC,classD,classE AND (dates,etc)(class1to5ID,Class6to10ID,class11to15ID,class16to20ID) where class1to5ID references a row in the first table,etc.

Here are my questions:
Is it terribly efficient for me to keep one massive table containing columns for class1 to class20 even when most of those columns won't ever contain data?
If I am doing searches through millions of records, would I be faster with one massive table, or by doing a bunch of joins as described in the second method?
Is my database going to be that much bigger if I store 'empty' data? It seems like hard drives are s big now that storing 'empty' data is not that much of an issue.


Thank you very much for your help with this.
 
note that id's, dates, class1,class2...,class20,classA...classE,class1to5ID,etc are all columns in my tables
 
> Is it terribly efficient for me to keep one massive table containing columns for class1 to class20 even when most of those columns won't ever contain data?

It is likely efficient for trivial SELECTs and display - and bad for everything else. Typical by-the-book example of violated 1NF.

> If I am doing searches through millions of records, would I be faster with one massive table, or by doing a bunch of joins as described in the second method?

That depends on queries. Say, "gimme all classes Joe Smith attended last year". How would ya do that query with existing table(s)?

> Is my database going to be that much bigger if I store 'empty' data? It seems like hard drives are s big now that storing 'empty' data is not that much of an issue.

I'd say that's not the primary issue here. While disk space is cheap, CPU licences aren't :p

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
The most complicated queries I need to do are just to select some scores for student joe schmoe, maybe use some functions (count,avg,etc), and maybe some simple calculated queries (class1-class2). I will also be calculating for joe schmoe over a specified time frame.

Also, if I use 1NF, then I am going to have something like

TableStudent
ID,Some other stuff

TableClasses
FK_StudentID,ClassNumber,Score,Date

The problem I see with this is that TableClass is now has 20 times more rows. So, my originl long table has about 20 million rows, the 1NF tablbe (tableclasses) has 400 million rows. This seems really expensive join wise.

Just so you know, I really don't know a lot quantatively about how expensive different operations are. So what I'm saying could be complete off.
 
Are these classes (1-20) hard-coded? class1 always means this, class2 that etc?

> So, my originl long table has about 20 million rows, the 1NF tablbe (tableclasses) has 400 million rows.

Not exactly... "empty cells" are not stored in 1NF table.

Btw. pros and cons of going 1NF are well known. With flat tables common demands (browsing, edit, simple reports) are easy to implement; anything above that soon turns into dynamic SQL/data integrity/security nightmare.

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
Ok, thanks. I've redesigned my database a bit. It is not exactly 1NF but it is much closer
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top