Is this a big deal? 1

joelwenzel · Mar 24, 2006

I have a data base that has many columns. Basically, there are 36 data columns. However, in some rows, as few as only 10 columns will contain data (the others will be null). Most of the time, only 15 to 20 of the rows will contain data. The columns are of datatype tinyint(3) so i don't think they take up a whole lot of memory. Also, the table will eventually take up millions and millions of rows.

So, should I keep the table designed with many many blank columns or should I be looking at something else?

I was thinking on other possiblity was to hve a table that contains 10 columns (for data) and then join rows in that table to ones in my main table (via a view)...but all those joins seem like they could be expensive.

vongrunt · Mar 24, 2006

Are (some of) these columns repeatable (a la blah1, blah2... blah15) or not?

------
[small]select stuff(stuff(replicate(' ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]

SQLDenis · Mar 24, 2006

Depends, what do you need to display? All columns all the time or not? if not split up the table into 2

Denis The SQL Menace
SQL blog:

http://sqlservercode.blogspot.com/

Personal Blog:

http://otherthingsnow.blogspot.com/

joelwenzel · Mar 24, 2006

I don't need to display all columns all the time because some of the columns will never show data. Actually, here, I will give you an example of what I mean.

Suppose, I am creating a system to track student scores in, say, 20 different classes. Most students take 10 classes, however, some take 5 and some take 20 (and some take numbers inbetween). My table is

Student Scores
(id's dates etc) (Class1,class2,class3,class4,...class19,class20)

Now I want to calculate some stats...eg, the average for class1, graphs, etc. Any class that is null (contains no data for a particular student) are not included in the calculation. So, I guess what I'm rally wondering is if I should somehow split this table and just use joins to put blocks of class scores together (eg 2 tables consisting of Block ID, ClassA,classB,classC,classD,classE AND (dates,etc)(class1to5ID,Class6to10ID,class11to15ID,class16to20ID) where class1to5ID references a row in the first table,etc.

Here are my questions:
Is it terribly efficient for me to keep one massive table containing columns for class1 to class20 even when most of those columns won't ever contain data?
If I am doing searches through millions of records, would I be faster with one massive table, or by doing a bunch of joins as described in the second method?
Is my database going to be that much bigger if I store 'empty' data? It seems like hard drives are s big now that storing 'empty' data is not that much of an issue.

Thank you very much for your help with this.

joelwenzel · Mar 24, 2006

note that id's, dates, class1,class2...,class20,classA...classE,class1to5ID,etc are all columns in my tables

vongrunt · Mar 24, 2006

> Is it terribly efficient for me to keep one massive table containing columns for class1 to class20 even when most of those columns won't ever contain data?

It is likely efficient for trivial SELECTs and display - and bad for everything else. Typical by-the-book example of violated 1NF.

> If I am doing searches through millions of records, would I be faster with one massive table, or by doing a bunch of joins as described in the second method?

That depends on queries. Say, "gimme all classes Joe Smith attended last year". How would ya do that query with existing table(s)?

> Is my database going to be that much bigger if I store 'empty' data? It seems like hard drives are s big now that storing 'empty' data is not that much of an issue.

I'd say that's not the primary issue here. While disk space is cheap, CPU licences aren't

------
[small]select stuff(stuff(replicate(' ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]

joelwenzel · Mar 24, 2006

The most complicated queries I need to do are just to select some scores for student joe schmoe, maybe use some functions (count,avg,etc), and maybe some simple calculated queries (class1-class2). I will also be calculating for joe schmoe over a specified time frame.

Also, if I use 1NF, then I am going to have something like

TableStudent
ID,Some other stuff

TableClasses
FK_StudentID,ClassNumber,Score,Date

The problem I see with this is that TableClass is now has 20 times more rows. So, my originl long table has about 20 million rows, the 1NF tablbe (tableclasses) has 400 million rows. This seems really expensive join wise.

Just so you know, I really don't know a lot quantatively about how expensive different operations are. So what I'm saying could be complete off.

vongrunt · Mar 24, 2006

Are these classes (1-20) hard-coded? class1 always means this, class2 that etc?

> So, my originl long table has about 20 million rows, the 1NF tablbe (tableclasses) has 400 million rows.

Not exactly... "empty cells" are not stored in 1NF table.

Btw. pros and cons of going 1NF are well known. With flat tables common demands (browsing, edit, simple reports) are easy to implement; anything above that soon turns into dynamic SQL/data integrity/security nightmare.

------
[small]select stuff(stuff(replicate(' ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]

joelwenzel · Mar 28, 2006

Ok, thanks. I've redesigned my database a bit. It is not exactly 1NF but it is much closer

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Is this a big deal? 1

joelwenzel

Programmer

vongrunt

Programmer

SQLDenis

Programmer

joelwenzel

Programmer

joelwenzel

Programmer

vongrunt

Programmer

joelwenzel

Programmer

vongrunt

Programmer

joelwenzel

Programmer

Similar threads

Part and Inventory Search

Sponsor