Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Ideas for DB structure for a search engine needed

Status
Not open for further replies.

LucieLastic

Programmer
May 9, 2001
1,694
GB
hi

I currently have a simple database structure but it's not quick.

I have stories which come in and the main text and title is stripped from them. I have a list of stop words (eg 'and', 'the' etc) which I remove from the text. I then strip out the 'distinct' words and look up their wordID from my WordIndex table. I have another table called WordStoryLinks which stores the storyID and the wordIDs for all the words in that story.

My storyID and wordID columns are both indexed in the WordStoryLinks table as this is the main table for future searches. This table is growing fast and has quite a few million rows in it.

WordIndex
---------
wordID <PK>
wordString

WordStoryLinks <---main one for searching on
--------------
storyID <FK>
wordID <FK>

StoryDetails
------------
storyID <PK>
storyTitle
filename
:

Is this the best way to do it for the WordStoryLinks table. I'm thinking of doing some maths on this table and break it down into several tables, all roughly the same size. Of course human intervention will be required every so often for the last table, to check whether a new table is created or not to carry on the chain.

Can anyone give me some pointers on how to get a fast structure? If the structure is acceptable, then it could be my querying that's slow. How is Google so fast at searching it's database - how's it's DB structured, anyone know?

Any help grateful,
lou

 
I just want to add that the current version has been running quite a while now and new words don't creep up that often. I'm thinking along the lines of breaking the table down and I wouldn't know the size of the last table in the chain, but as new words are very infrequent now, growth of the table on new words links should be slow. If I do this, I will need a 4th table, like:

WordTableLinks
--------------
MinWordID
MaxWordID
TableName

Is this a good idea?

I could do with some ideas.

many thanks
lou

 
hi SQLSister

Do you mean store the wordstring in the the WordStoryLinks table rather than an wordID and index on that?

If not, could you explain what you mean?

big ta
lou

 
Full text search is a way to search longer fields for individual words. It is a feature of SQl Server and uses the Windows 2000 Server search engine I believe. It is somewhat compliated to set up, but you could look in Books Online to find out what you need to do and more about what it does.
 
Anyone compared Oracle's Full-Text capabilities to SQL Server's? Just curious which database is the best one to use for storage and retrieval of Full-Text.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top