Here comes long post. Actually it will take no more than 15 minutes of your time.
Overview
Suppose we have hierarchical data of some kind - structure of employees, genealogy tree, CMS structure, whatever:
Such structures are usually stored in SQL database in adjacency-list form:
In other words: parent-child stuff. That representation makes changes efficient, though common SELECTs like:
- gimme path to specified node (J => { A, H, I })
- gimme subtree for speficied node (D => {E, F})
... require row-based processing - with stack or recursion. This is of course slow. Some RDBMSes can handle that transparently with hierarchical SELECT extensions (Oracle - START WITH/CONNECT/PRIOR, SQL2005 - recursive CTE), some can not.
Suppose #2 we have (R)DBMS that can not, like SQL2000 or mySQL. In order to speed up SELECTs that give features from above, we must add some redundant "indexed" columns into table. There are several methods available, and nested set model (NSM) is one of them. IMHO idea behind NSM is brilliantly simple. Add two columns:
Take one global variable. Set it to 0. Then imagine you are traversing tree the same way drive folders are typically scanned. When entering folder, increment variable by 1 and assign value to left index (nsmLeft). When leaving folder, do the same but on right index (nsmRight). Final result will be:
Once "indexed", common hierarchical SELECTs become trivial and very fast:
There are some other interesting featured related to NSM... for anyone interested here is useful link by author himself (no video gamers jokes please ).
Problem/challenge
Write code that calculates NSM indexes (nsmLeft, nsmRight) - all at once from scratch. To make it more complicated, make it as fast as possible. FYI there are ways to do that without cursor or stack. Author of best answer will get that purple thingy.
Later I will post code that makes sample tree from hard drive folder/file structure... insta 20,000+ nodes for performance testing.
Rules & restrictions
[ul]
[li]Take sample data from above. Your code must fill in nsmLeft/nsmRight columns. Final result must match.[/li]
[li]DON'T assume siblingOrder never has holes (1, 2, 3) - better imagine it is some kind of tabIndex property in GUI forms.[/li]
[li]DON'T assume maximum tree depth is finite. Or that NodeID is nicely sorted as in sample data.[/li]
[li]All tools & languages are allowed. I'm primarily interested into fastest possible way - maybe this eliminates client-side code (VB, C#, Python, whatever), maybe not - we shall see.[/li]
[li]One exception: don't use ANY hierarchical feature, if RDBMS you are using supports 'em.[/li]
[/ul]
------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
Overview
Suppose we have hierarchical data of some kind - structure of employees, genealogy tree, CMS structure, whatever:
Code:
A
--- B
--- C
--- D
--- E
--- F
--- G
--- H
--- I
--- J
Code:
create table sampleNodes
( nodeID char(1) primary key,
parentID char(1) null,
siblingOrder smallint
)
go
insert into sampleNodes values ('A', null, 1 )
insert into sampleNodes values ('B', 'A', 1 )
insert into sampleNodes values ('C', 'A', 2 )
insert into sampleNodes values ('D', 'C', 1 )
insert into sampleNodes values ('E', 'D', 1 )
insert into sampleNodes values ('F', 'D', 2 )
insert into sampleNodes values ('G', 'C', 2 )
insert into sampleNodes values ('H', 'C', 3 )
insert into sampleNodes values ('I', 'H', 1 )
insert into sampleNodes values ('J', 'I', 1 )
- gimme path to specified node (J => { A, H, I })
- gimme subtree for speficied node (D => {E, F})
... require row-based processing - with stack or recursion. This is of course slow. Some RDBMSes can handle that transparently with hierarchical SELECT extensions (Oracle - START WITH/CONNECT/PRIOR, SQL2005 - recursive CTE), some can not.
Suppose #2 we have (R)DBMS that can not, like SQL2000 or mySQL. In order to speed up SELECTs that give features from above, we must add some redundant "indexed" columns into table. There are several methods available, and nested set model (NSM) is one of them. IMHO idea behind NSM is brilliantly simple. Add two columns:
Code:
alter table sampleNodes add nsmLeft int
alter table sampleNodes add nsmRight int
Code:
nodeID parentID siblingOrder nsmLeft nsmRight
--------.--------.------------.---------.---------
A NULL 1 1 20
B A 1 2 3
C A 2 4 19
D C 1 5 10
E D 1 6 7
F D 2 8 9
G C 2 11 12
H C 3 13 18
I H 1 14 17
J I 1 15 16
Code:
-- gimme path to specified node
select A.*
from sampleNodes A
inner join sampleNodes B on A.nsmLeft < B.nsmLeft and A.nsmRight > B.nsmRight
where B.nodeID = 'J'
order by A.nodeID
-- gimme subtree for specified node
select A.*
from sampleNodes A
inner join sampleNodes B on A.nsmLeft > B.nsmLeft and A.nsmRight < B.nsmRight
where B.nodeID = 'D'
order by A.nodeID
Problem/challenge
Write code that calculates NSM indexes (nsmLeft, nsmRight) - all at once from scratch. To make it more complicated, make it as fast as possible. FYI there are ways to do that without cursor or stack. Author of best answer will get that purple thingy.
Later I will post code that makes sample tree from hard drive folder/file structure... insta 20,000+ nodes for performance testing.
Rules & restrictions
[ul]
[li]Take sample data from above. Your code must fill in nsmLeft/nsmRight columns. Final result must match.[/li]
[li]DON'T assume siblingOrder never has holes (1, 2, 3) - better imagine it is some kind of tabIndex property in GUI forms.[/li]
[li]DON'T assume maximum tree depth is finite. Or that NodeID is nicely sorted as in sample data.[/li]
[li]All tools & languages are allowed. I'm primarily interested into fastest possible way - maybe this eliminates client-side code (VB, C#, Python, whatever), maybe not - we shall see.[/li]
[li]One exception: don't use ANY hierarchical feature, if RDBMS you are using supports 'em.[/li]
[/ul]
------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]