When setting up a new SQL Server many DBAs simply let the System Administrators setup the disk arrays. This is a mistake. Setting up the disk arrays for a SQL Server is much different than setting up the disk arrays for a file server or print server. On large databases the disk array configuration can greatly affect system performance. From what I have seen on tek-tips many DBAs don't know how things need to be setup. Some don't know the raid levels and what they mean. I'll try to cover all that here.
RAID Levels RAID 0 (also called a strip set) - Very fast, no redundancy. RAID 1 (also called a mirror) - not very fast, maximum redundancy. RAID 4 - fast, very good redundancy. RAID 5 - fast, good redundancy. RAID 0+1 (a mirrored strip) - very fast, maximum redundancy. RAID 5+1 - fast, maximum redundancy. There are other RAID levels available depending on what RAID card vendor you use. These are the most common used raid levels.
Terms you'll need to know
These are some terms that you'll need to know to be able to understand this FAQ.
Rebuilding the Array - When you replace a disk in a RAID array the array has to copy the data to that disk. This is called rebuilding the array.
RAID Cache - This is memory built onto the RAID controller and is used to speed up operations to and from the disks. Having Cache on the controller makes a huge difference.
Hot Spare - A hot spare is a spare hard drive installed in your server. In the event of a disk failure the array will automatically rebuild to the hot spare. This means that your data is redundant in the fastest possible time. When you replace the failed drive the new drive now becomes the hot spare. (RAID 0 can not make use of a hot spare as it is not redundant. See Below.)
Parity - Parity is used to create the redundancy in RAID 4 and RAID 5 arrays. Here is a basic example of how parity works. If we write the word cat to a 3 disk RAID 5 array disk 1 would have "ca" written to it. Disk 2 would have "at" written to it. Disk 3 would have "ct" written to it. Also written would be a marker saying how to put it all back together again. This allows any of the three disks to fail, and the data to still be available.
TPS (Transactions Per Second) - This is the number of commands being processed per second. Small systems will be measured in TPM (Transactions Per Minute).
Lets take a look a little deeper into these RAID levels so we can make a more informed decision about which ones to select. RAID 0 A RAID 0 array must have 2 or more disks. A RAID 0 array has no redundancy. As data is written to the array it is spread across the disks for maximum speed. This RAID level should never be used for a SQL Server. In the event of a failure of a single disk in the array all data is lost. To calculate the space created from a RAID 0 array multiple the number of drives times the size of the drives. (Example: 3 drives x 36GB = 108 GB)
RAID 1 A RAID 1 array can have only 2 disks in the array. A RAID 1 array has maximum redundancy as every byte of data is written to both disks at the same time. From a speed point of view a RAID 1 array is no faster or slower than a single disk. In the event of a single disk failure the second disk continues to serve the data to the server. The software does not know that a disk has failed. To calculate the space created for a RAID 1 array it is simply the size of one drive. (Example: 2 drives x 36GB = 36GB)
RAID 4 A RAID 4 array must have at least 4 disks in the array. A RAID 4 array also has a nice balance of speed and redundancy. A RAID 4 array can survive two disk failures without affecting the data. RAID 4 arrays are best used for arrays that are read intensive. The reason for this is that for every byte written to the disk parity has to be determined. This does not have to be done when reading data from the disk. To calculate the space created for a RAID 4 array take the number of drives, subtract 2 and multiply by the size of the drives. (Example: 4 drives x 36GB = 72GB)
RAID 5 A RAID 5 array must have at least 3 disks in the array. A RAID 5 array has a nice balance of speed and redundancy. A RAID 5 array is the most cost effective RAID level (the least cost effective are RAID 1 and RAID 0+1). A RAID 5 array can survive a single disk failure without affecting the data. RAID 5 arrays are best used for arrays that are read intensive. The reason for this is that for every byte written to the disk parity has to be determined. This does not have to be done when reading data from the disk. To calculate the space created for a RAID 4 array take the number of drives, subtract 1 and multiply by the size of the drives. (Example: 4 drives x 36 GB = 108GB)
RAID 0+1 A RAID 0+1 array must have at least 4 disks, and it must have an even number of disks in the array (a 2 disk RAID 0+1 array is simply a RAID 1 array). The basic idea of a RAID 0+1 array is that the RAID controller makes two RAID 0 arrays and mirrors them together. This gives your maximum speed and maximum redundancy. However this takes a lot of disks, and a therefore costs a lot of money. To calculate the space created for a RAID 0+1 array take the number of drives, divide by 2 and multiply by the size of the drives. (Example: 4 drives x 36GB = 72GB)
Raid 5+1 A RAID 5+1 array must have at least 6 disks, and it must have an even number of disks in the array. The basic idea of a RAID 5+1 array is that the RAID controller makes two RAID 5 arrays, and mirrors them together. This configuration gives you better redundancy than a RAID 0+1 without giving up to much performance. To calculate the space created for a RAID 5+1 array take the number of drives, divide by 2, subtract 1 and multiply by the size of the drives. (Example: 6 drives x 36GB = 72GB) A RAID 5+1 array will give you the same performance as a RAID 5 array but with increased redundancy. Only a few select RAID cards can handle this RAID level. For the most part this RAID level is used within a SAN environment.
In my size examples you may have noticed that the RAID 4 and RAID 0+1 arrays came out the same. The reason for this is because we only used 4 drives. If we increased the number of drives the RAID 4 array will be larger than the RAID 0+1 array. RAID 4 - 8 drives * 36GB = 216GB RAID 0+1 - 8 drives * 36GB = 144GB
Now that we know what we are talking about with the hardware lets look at what array should be used for what application.
The first place to start with is the Operating System. Many people put Windows on a RAID 5 array. Windows doesn't need to be on a RAID 5 array. In fact putting it on a RAID 5 array will slow down the overall operation of the server compared to a RAID 1 array. The reason for this is the page file. After Windows Server starts it doesn't use the disk for much except for the page file. The page file is very write intensive (about 50/50 read and write). You don't want to have to spend that extra time calculating parity for data that is going to be on the disk for a few seconds to a minute or two. And a RAID 1 array will provide just as much redundancy as a 3 drive RAID 5 array. So why not take that RAID 5 array and make a RAID 1 array with a hot spare.
Now that we have the operating system on the correct RAID level, lets look at the various parts of the data. The various parts that we will be addressing are the binaries (the actual exe and dll files of the server), the system databases (master, msdb, model), the user databases data files (all the databases you put on the database), user database transaction logs, and tempdb. (When it says RAID 5 a RAID 4 or RAID 5+1 can also be used as well. I am simply choosing the less expensive option for my example.)
Binaries - I usually place the binaries on the OS drive. For the most part they are read from once (when the SQL Services are started) and not again. The only thing that will be written here are the error log and sql agent log files.
System databases - Each system database has different requirements. Overall a RAID 5 is best.
master - The master database is mostly read, seldom write.
msdb - The msdb database is about 50/50 read/write.
model - The model database is ready only when a new database is created or the SQL Service is started. It should almost never be written to.
User databases data files - Most databases are mostly read, seldom write. Because of this a RAID 5 array is the best option. If your database is never read from and written to often (for logging for example) then a RAID 0+1 would be your best option.
User databases log files - Transaction Log files are very write intensive. They should be placed on a RAID 1 or RAID 0+1 array.
tempdb - Tempdb is very write intensive. It should be placed on a RAID 1 or RAID 0+1 array. (On most any SQL Server the data file and log file for the tempdb database can be placed on the same array. On SQL Servers with trully massive tempdb usage you may want to split the tempdb data file and log file onto thier own RAID 1 or RAID 0+1 arrays.)
Now that we've got all this information, lets put it all together and design some servers. I'll start small, and work up to a huge system. (These are just guidelines, based on my previous experience. There are no hard and fast rules.)
The one thing that all these configurations will have in common is that the OS should be on a RAID 1, and that the SQL Binaries should be on the same drive as the OS. (As with before, any time I say RAID 5 a RAID 4 or RAID 5+1 will work just as well. There is simply more cost for a RAID 4.)
When deciding what size your server qualifies as, be sure to enclude size in this. For example a server with 5 users, and 5 TPS, that is 800 GB in size, would be a large database server.
First the smaller database server. This would be for a smaller system, with only a few users and a low number of transactions per second (say 1-10 TPS). For a small server like this one, putting the databases, transaction logs, and tempdb all on the same RAID 5 array is perfectly fine. The load on the system is low enough that the system speed won't suffer by having them all on the same drives.
Next is a medium size database server. This would be for an average sized database server. Up to a few hundred users and up to 100 TPS. This is when you will want to start breaking things up. You'll want the databases and logs on separate drives. RAID 5 for the databases and RAID 1 for the logs. You'll have to gage where tempdb should go based on it's usage. If it gets used a lot you'll want to break it off to it's own RAID 1 array.
Now to the large database server. Now we are looking at systems with thousands of users, and/or hundreds of TPS. Servers like this need to be specked out VERY carefully. Any small mistake can easily deteriorate system performance. This is when you not only want to break apart the database files and the logs, but you want to start breaking up parts of the database to separate drive arrays. This is handled through creating additional file groups and moving objects into those file groups.
As with everything involving databases, the system will dictate the number of file groups and the file placement on the disks. You may need to improve the performance of your indexes so you can place the non-clustered indexes into a separate database file on a separate drive. Or you may place the full text data (the data within a text, ntext, or image data type field) on a separate drive.
When placing indexes on a separate drive how often the index is updated. If your index is updated often (lots of inserts, or updates to the column within the index) then a RAID 1 will be the better option. If the index is fairly static then a RAID 5 will be the better option. The same goes for the full text data.
You can also place different tables on different drives based on the table need. Tables that are written to often, but not read often can be placed on a RAID 1 array. And likewise tables that are read from mostly can be placed on a RAID 5 array.
I hope that you found this FAQ useful. If you have any questions for me feel free to post them in this forum, or use the "Send a Comment to mrdenny About this FAQ" link below to email me directly.