SQL Puzzle #16: nested set model

vongrunt · Apr 12, 2006

Here comes long post. Actually it will take no more than 15 minutes of your time.

Overview

Suppose we have hierarchical data of some kind - structure of employees, genealogy tree, CMS structure, whatever:

Code:

A
--- B
--- C
    --- D
        --- E
        --- F
    --- G
    --- H
         --- I
             --- J

Such structures are usually stored in SQL database in adjacency-list form:

Code:

create table sampleNodes 
(  nodeID char(1) primary key,
   parentID char(1) null,
   siblingOrder smallint
)
go        

insert into sampleNodes values ('A', null, 1 )
insert into sampleNodes values ('B',  'A', 1 )
insert into sampleNodes values ('C',  'A', 2 )
insert into sampleNodes values ('D',  'C', 1 )
insert into sampleNodes values ('E',  'D', 1 )
insert into sampleNodes values ('F',  'D', 2 )
insert into sampleNodes values ('G',  'C', 2 )
insert into sampleNodes values ('H',  'C', 3 )
insert into sampleNodes values ('I',  'H', 1 )
insert into sampleNodes values ('J',  'I', 1 )

In other words: parent-child stuff. That representation makes changes efficient, though common SELECTs like:

- gimme path to specified node (J => { A, H, I })
- gimme subtree for speficied node (D => {E, F})

... require row-based processing - with stack or recursion. This is of course slow. Some RDBMSes can handle that transparently with hierarchical SELECT extensions (Oracle - START WITH/CONNECT/PRIOR, SQL2005 - recursive CTE), some can not.

Suppose #2 we have (R)DBMS that can not, like SQL2000 or mySQL. In order to speed up SELECTs that give features from above, we must add some redundant "indexed" columns into table. There are several methods available, and nested set model (NSM) is one of them. IMHO idea behind NSM is brilliantly simple. Add two columns:

Code:

alter table sampleNodes add nsmLeft int
alter table sampleNodes add nsmRight int

Take one global variable. Set it to 0. Then imagine you are traversing tree the same way drive folders are typically scanned. When entering folder, increment variable by 1 and assign value to left index (nsmLeft). When leaving folder, do the same but on right index (nsmRight). Final result will be:

Code:

nodeID   parentID siblingOrder   nsmLeft  nsmRight
--------.--------.------------.---------.---------
A            NULL            1         1        20
B               A            1         2         3
C               A            2         4        19
D               C            1         5        10
E               D            1         6         7
F               D            2         8         9
G               C            2        11        12
H               C            3        13        18
I               H            1        14        17
J               I            1        15        16

Once "indexed", common hierarchical SELECTs become trivial and very fast:

Code:

-- gimme path to specified node
select A.*
from sampleNodes A
inner join sampleNodes B on A.nsmLeft < B.nsmLeft and A.nsmRight > B.nsmRight
where B.nodeID = 'J'
order by A.nodeID

-- gimme subtree for specified node
select A.*
from sampleNodes A
inner join sampleNodes B on A.nsmLeft > B.nsmLeft and A.nsmRight < B.nsmRight
where B.nodeID = 'D'
order by A.nodeID

There are some other interesting featured related to NSM... for anyone interested here is useful link by author himself (no video gamers jokes please [wink]

).

Problem/challenge

Write code that calculates NSM indexes (nsmLeft, nsmRight) - all at once from scratch. To make it more complicated, make it as fast as possible. FYI there are ways to do that without cursor or stack. Author of best answer will get that purple thingy.

Later I will post code that makes sample tree from hard drive folder/file structure... insta 20,000+ nodes for performance testing.

Rules & restrictions

[ul]
[li]Take sample data from above. Your code must fill in nsmLeft/nsmRight columns. Final result must match.[/li]
[li]DON'T assume siblingOrder never has holes (1, 2, 3) - better imagine it is some kind of tabIndex property in GUI forms.[/li]
[li]DON'T assume maximum tree depth is finite. Or that NodeID is nicely sorted as in sample data.[/li]
[li]All tools & languages are allowed. I'm primarily interested into fastest possible way - maybe this eliminates client-side code (VB, C#, Python, whatever), maybe not - we shall see.[/li]
[li]One exception: don't use ANY hierarchical feature, if RDBMS you are using supports 'em.[/li]
[/ul]

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]

vongrunt · Apr 19, 2006

Here is table for performance testing:

Code:

create table sampleNodes
( 	nodeID int primary key,
	fileName varchar(255) not null,
	fileSize bigint,
	isFolder bit,
	parentID int,
	siblingOrder smallint,
	absoluteDepth tinyint,
	nsmLeft int,
	nsmRight int,
)
go

I added redundant column absoluteDepth; it may speed up indexing (hint!) and since NSM crawls entire tree anyway, cost of it's calculation is marginal. Here is VB(Script) code to fill sample nodes:

Code:

-- sample conn. string for M$SQL, SQL authentication
Const DB_CONN_STR = "Provider=SQLOLEDB.1;Server=<servername>;User ID=<username>;Password=<******>;Initial Catalog=<yourDB>;"

Dim oConn: Set oConn = createObject("ADODB.Connection")
Dim oFSO: Set oFSO = createObject("Scripting.FilesystemObject"): oConn.Open DB_CONN_STR

scanFolder "c:\Program Files"

Set oFSO = nothing
oConn.close: Set oConn = nothing

'-----
Sub scanFolder( sPath )
	oConn.Execute "delete from sampleNodes"
	
	Dim iCnt: iCnt = 0
	scanFolderRecursive iCnt, NULL, sPath, 1
End Sub	

'-----
Private Sub scanFolderRecursive( byRef iCnt, byVal iParentNode, sPath, iSiblingOrder )
	Dim oFolder: set oFolder = oFSO.getFolder( sPath )
	Dim oFile
	
	insertSampleNode iCnt, iParentNode, IIf(iCnt=0, sPath, oFolder.name), true, iSiblingOrder, NULL
	iParentNode = iCnt
	iSiblingOrder = 0
	
	For Each oFile in oFolder.SubFolders
		iSiblingOrder = iSiblingOrder + 1
		scanFolderRecursive iCnt, iParentNode, sPath & "\" & oFile.name, iSiblingOrder
	Next
	
	For Each oFile in oFolder.Files
		iSiblingOrder = iSiblingOrder + 1
		insertSampleNode iCnt, iParentNode, oFile.name, false, iSiblingOrder, oFile.size
	Next
End Sub

'-----
Private Sub insertSampleNode( byRef iCnt, iParentNode, sFilename, bIsFolder, iSiblingOrder, iFileSize )
	iCnt = iCnt + 1
	
	oConn.Execute "INSERT INTO sampleNodes (nodeID, fileName, fileSize, isFolder, parentID, siblingOrder ) " & _
		"VALUES (" & iCnt & ", " & _
			 "'" & replace(sFilename, "'", "''") & "', " & _
			 IIf(isNull(iFileSize), "NULL", iFileSize) & ", " & _
			 IIf(bIsFolder, 1, 0) & ", " & _
			 Iif(isNUll(iParentNode), "NULL", iParentNode) & ", " & _
			 iSiblingOrder & ")"
End Sub

It is not bullet-proof (folder permissions etc), so fix it if necessary. Please note scanning may take a while (cca 150 nodes/s on my machine).

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

SQL Puzzle #16: nested set model

vongrunt

Programmer

vongrunt

Programmer

Similar threads

Part and Inventory Search

Sponsor