Hi George,
There is no network involved here, sorry perhaps I didn't make myself clear, obviously I understand when transmitting the data the amount of data selected is a factor, but yes the query returns most of the records (874,954 out of 1,010,786)
I'm running this directly on the physical SQL server box, it's outputting direct to the server on the server, hence I didn't think the select made much of a difference.
Though it seems there are further gaps in this course content as they didn't mention 'covering index', well to be fair, it didn't cover indexes at all. We were simply provided with a waterfall type schema diagram showing the order in which queries are executed (including correlated sub queries) and it shows the SELECT as the last thing performed against the final intermediate table produced from the other clauses, I guess they oversimplified the process for the course.
Results for each test type are as follows...
1. 5592 - Natural PK
2. 6105 - Surrogate PK
3. 5783 - No PK , Clustered
4. 5846 - No PK, Non-Clustered
But I should point out that each time I ran the query I got different results, so I ran the test 10 times in a loop for each configuration and used the mean average for the result.
I used this code to produce the above result averages...
Code:
DECLARE @Trash Int,
@Start DateTime,
@sum Int,
@cnt Int
SET @Start = GetDate()
SET @sum = 0
SET @cnt = 0
WHILE @cnt < 10
BEGIN
SELECT @Trash = 1
FROM Tracking_Log_Archive t JOIN Users u ON t.UserID = u.UserID
WHERE t.UserID LIKE 'U00%-%'
ORDER BY Used ASC
SET @sum = @sum + (SELECT DateDiff(Millisecond, @Start, GetDate()))
SET @cnt = @cnt + 1
END
SELECT @sum / 10 AS average, @cnt AS cnt
But again the results varied. They were consistent enough that on average the surrogate PK performed the slowest, albeit at times only by 100's of a millisecond.
So the results shown were a decision to run each test again for a final time and post the result regardless of winner, so the surrogate PK result isn't the best I've seen, but the same goes for the other results I saw better and worse as I repeatedly ran the test.
I guess this also just goes to show that even outputting direct to the console on the server takes time regardless of table configuration and the affect a simple superfluous integer column can have.
So what's the conclusion regarding what I shall implement, well interestingly enough, the answer is different depending on whether the table is the 'production' gathering table, or the archive table.
As the production table is written to constantly, insert speed is my priority, so a heap type table would be preferred as it isn't queried that often, however, for the archive where more analysis, data mining and reporting is performed, speed of query is priority.
So the question is do I simply have no PK in either, but on the production table have a non-clustered unique index constraint and the archive table a clustered one.
Or do I on the production table include a surrogate PK which acts as a heap?
I have no requirement for this additional PK information, it's not used as an FK anywhere and will be lost when the data is archived, so what's the difference between a surrogate PK auto identity type heap table vs a non-clustered unique index heap table with regard to inserts?
Regards,
1DMF
"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."
"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"
Free Electronic Dance Music