Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Database Good Practice views required 5

Status
Not open for further replies.

AndyLee100

Technical User
Jun 26, 2003
174
GB
I am having a discussion with a colleague in the office here regarding text field lengths in databases.

The main discussion is regarding where to control the length of the string that you are entering into the database.

To explain further,

You have a form with a text box and you want to limit the number of characters the user can enter to 30.

So you set the max length property to 30.

But what about the field in the database? Do you as a rule set this to 30 aswell or allow 255 or even the (MAX) field length?

I argue that if you are using VarChar as a data type in your SQL database then SQL will only store the actual length of the string and not pad out the remaining spaces.

If you are using nvarchar then I agree this is a different issue.

The basis of my argument is that you control the length of the data at only one point (ie the application) and not have to worry about changing the database. However my colleague argues that you should also change the length in the database.

Points of view greatly appreciated

Cheers

Andy
 
nvarchar fields take twice the storage space as varchar because they store unicode characters. char and nchar are padded with spaces.

File I/O on any system is usually the slowest part, so it makes sense that you would want to minimize it (to improve the performance of your app).

With SQL Server, data is stored in 8k pages. If you have a really wide table (many columns with lots of data), then you may only be able to store a couple records per 8k page. With narrow tables (fewer and smallr fields) you can fit more data in to an 8k page.

So, when you are selecting data, the fewer page reads you have, the faster your data will be returned. Of course, indexes affect performance a lot. This is more crucial when dealing with table scans in your query. (If you don't know what a table scan is... take your worst performing query, load it in query analyzer, press CTRL-K and then run the query... there will be an execution plan tab at the bottom of the window).

So, what you are suggesting is to have varchar fields be 255 bytes long. If you had a dozen fields, that would take 3k to store the record and you would only be able to get 2 records per page. If you made those fields shorter, you would be able to get MANY times more records per page, and your queries will be faster.

This point is so important for performance that many developers will make multiple tables to store information if they need to store large varchar data. The secondary table would have an ID and a large varchar field (or 2). This concept is called horizontal partitioning.

Bottom line... In my opinion, field sizes should not be made arbitrarily large because it will hurt your performance.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
I agree with George re the impacts on performance associated with selecting which text data types to use.

As to the "application" or "database" issue ...

My view is that the allowed length of a text field (regardless of the specific sub-type) is a data constraint and therefore it is more properly set and maintained at the database level.

Most RDBMS systems have mechanisms that can be used to retrieve the maximum defined lengths of fields and you can use those to set (for example) the maximum length of data in a text box. It's a bit more work but at least you are not having to maintain the same information in two different places (i.e. DB and application.)
 
gmmastros

You said,
So, what you are suggesting is to have varchar fields be 255 bytes long. If you had a dozen fields, that would take 3k to store the record.

This would only be true if the data stored in each field was up to the maximum of 255 characters.

From BOL:
The varchar data type is a variable-length data type. Values shorter than the size of the column are not right-padded to the size of the column.

Cheers

Andy
 
<This would only be true if the data stored in each field was up to the maximum of 255 characters.

I disagree entirely, Andy. First, "padding" is not the same thing as "memory allocation." It looks like you're confusing the two ideas. You appear to have the idea that memory is only allocated for characters that are actually there, and that different amounts of memory are allocated for each record depending on the record's contents. It doesn't work that way.

If a varchar(255) field had the value "BobRodes" in it, it would contain those characters and 247 null characters. A char(255) field would contain those characters and 247 spaces. Conversely, a varchar(20) with the same value would contain those characters and 12 null characters. As George says, you can get a lot more of those on an 8k page. All your quote is saying is that varchar fields do not substitute spaces into the memory locations containing null characters. It does NOT say that the field occupies less memory, and moreover such is not the case.

The reason that it's better to use varchar instead of char when the length of the actual data varies significantly has nothing to do with saving storage space. Rather, it has to do with improving I/O performance: if you are using char, you have to package up each space and send it along to wherever with the rest of the real data. You don't have to do that with varchar.

I say you might want to research George's answer a bit more carefully before dismissing it as you appear to have done.

HTH

Bob
 
Bob,

Have a star. You are correct in your asumptions. I guess I should have all the facts before dismissing someone. Sorry George.

At least I am a little wiser now and will take more care in the future to properly define my column widths.

Thanks

Andy
 
Bob,

I am a little confused after doing a bit of research.

From MSDN:
'For varchar, the storage size is the actual length in bytes of the data entered, again not to exceed 8,000 bytes'.

I understand what you are saying but can you possibly clarify with the quote above in mind?

Many thanks

Andy
 
I'm afraid that, as far as I am aware, Bob and george are incorrect. In SQL Server (but not necessarily other database systems) VARCHARs are indeed of variable length, and the storage size is indeed the actual length of the data entered.

This can have some performance benefits where I/O is concerned and often when sorting, but suffer some performance hits in, say, searching.

I vaguely recall MS advising that as a rule of thumb char fields of less than 20 characters should be stored as CHAR, and greater than 20 chars as VARCHAR

That being said, most advice on SQL Sever performance usually involves saying that fields should be of the shortest length possible.
 
I'm afraid that I have to agree with strongm, and, uh, disagree with myself.

I honestly believed everything I previously said. I was not trying to mislead you. I did run a test like this...

Code:
Create table Wide(id integer, Data VarChar(8000))
Create table narrow(id integer, data varchar(10))

Declare @i INteger

Set @i = 1

While @i <= 255
  Begin
    Insert Into Wide(Id, Data) Values(@i, char(@i))
    Insert Into Narrow(Id, Data) Values(@i, char(@i))

    Set @i = @i + 1
  End

Then I ran...

dbcc showcontig (wide)
dbcc showcontig (narrow)

Both showed indentical information.
[tt][blue]TABLE level scan performed.
- Pages Scanned................................: 1
- Extents Scanned..............................: 1
- Extent Switches..............................: 0
- Avg. Pages per Extent........................: 1.0
- Scan Density [Best Count:Actual Count].......: 100.00% [1:1]
- Extent Scan Fragmentation ...................: 0.00%
- Avg. Bytes Free per Page.....................: 3506.0
- Avg. Page Density (full).....................: 56.68%[/blue][/tt]

I still recommend you make your fields be a reasonably size. Then, do as golom suggests. Before loading your form, check the size of the database field and set your maxlengths accordingly.

Code:
Select Column_Name, data_type, character_maximum_length
From   Information_Schema.Columns
Where  Table_Name = 'blah'

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
George,

No probs.

Thanks for the clarification.

Thanks also to strongm

Andy
 
Well. First, mea culpa. I feel like Ann Landers after giving bad advice.

Strongm gets a star for being right, as always.

George gets a star for going to the trouble of verifying that strongm (and Andy) are right through testing.

Andy gets a star for putting up with some bad analysis, and hanging tough to get to the bottom of it.

Funny how you can think you know something for years, and find out you're all wet. Sorry, Andy, and thank you all for setting me straight on this one.

Bob
 
Bob,

No problems at all. I am glad that we have got to the bottom of this one.

After reading your post I did start to panic a little bit as I have read many of your previous posts and found them informative and generally right on the money.

I did doubt what I thought I knew as I have only been developing professionally for 2 years now and have found previously that I was programming in ignorance.

I have taken onboard all comments regarding field sizes and will endeavor to keep them to a reasonable size in future.

Thanks to all who posted in this thread.

Andy
 
Yes, well, I am right 92.7% of the time.

Wish they'd let me back into the NASA program, but hey, Microsoft and I have been talking.....
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top