I've been playing around with this for a little bit. Here's what I've come up with...
1. Best is a relative term, and can be hard to quantify. In order to qualify for 'best' status, something can be a lot better, or only a little better than the number 2 thing. Ex: 2 Olympic runner and I run a race. The times are 2:10, 2:11, 8:40. (I'm not a good runner
![[smile] [smile] [smile]](/data/assets/smilies/smile.gif)
). In my opinion, the difference between the number 1 guy and the number 2 guy is not enough to be definitive. The same applies here. The difference in time between joins on integers and varchars is insiginificantly small.
2. When joining tables, it is more important to make sure indexes are used for the join columns. Without proper indexing in the database, the difference in time would be like comparing my race time with an olympic gold medalist's time. The difference in time is huge.
There are many database systems out there that use GUID's for primary keys instead of integers. The reason these systems don't fail is because joining on GUID's is almost as fast as joining on integers, in fact the time difference in negligible.
That being said, I would still advise that you use some arbitrary value as the primary key (which I presume you are joining on with other tables). For example, suppose you decide to use a combination of first name and last name for your joins. Everything works fine until the day you add another person that has the exact same name as someone else. Of course, if this is the primary key, you won't be able to add the person (problem #1), and even if you were, your joins would fail because it would return the wrong data (problem #2). And then, someone will get married and want to change their name (problem #3).
By using arbitrary values as your primary key, you avoid all of these issues. In fact, most systems don't show the key value to end users because it doesn't mean anything. They exist solely for the purpose of maintaining data integrity.
If you agree with everything I said, then consider this... since you should be using arbitrary values for your keys (which you usually join to other tables on), then why not pick the data type of the one that gives the best performance. Even if integers are only slightly better than varchars, I would prefer to use them. You see, integers take 4 bytes to store, varchars take 1 byte per character, and nvarchar takes 2 bytes per character. By using integers, you use less storage for the data, but you also use less data for the indexes.
This has other effects on the database. SQL Server stores data in 8K pages. The more records you can fit in to an 8K page, the less physicall I/O (hard drive access) you'll have. The less I/O you have, the faster your overall performance.
And, finally... SQL Server caches data in memory. The more data that is cached in memory, the faster your performance will be. By reducing the amount of memory overhead, you will effectively be able to store more meaningful information in memory and your performance will be better.
So... While it is true that join performance differences between integers and varchars is minimal, you should consider the overall system performance. Integers are better, and should therefore be used.
-George
Strong and bitter words indicate a weak cause. - Fortune cookie wisdom