LucieLastic
Programmer
hi
I currently have a simple database structure but it's not quick.
I have stories which come in and the main text and title is stripped from them. I have a list of stop words (eg 'and', 'the' etc) which I remove from the text. I then strip out the 'distinct' words and look up their wordID from my WordIndex table. I have another table called WordStoryLinks which stores the storyID and the wordIDs for all the words in that story.
My storyID and wordID columns are both indexed in the WordStoryLinks table as this is the main table for future searches. This table is growing fast and has quite a few million rows in it.
WordIndex
---------
wordID <PK>
wordString
WordStoryLinks <---main one for searching on
--------------
storyID <FK>
wordID <FK>
StoryDetails
------------
storyID <PK>
storyTitle
filename
:
Is this the best way to do it for the WordStoryLinks table. I'm thinking of doing some maths on this table and break it down into several tables, all roughly the same size. Of course human intervention will be required every so often for the last table, to check whether a new table is created or not to carry on the chain.
Can anyone give me some pointers on how to get a fast structure? If the structure is acceptable, then it could be my querying that's slow. How is Google so fast at searching it's database - how's it's DB structured, anyone know?
Any help grateful,
lou
I currently have a simple database structure but it's not quick.
I have stories which come in and the main text and title is stripped from them. I have a list of stop words (eg 'and', 'the' etc) which I remove from the text. I then strip out the 'distinct' words and look up their wordID from my WordIndex table. I have another table called WordStoryLinks which stores the storyID and the wordIDs for all the words in that story.
My storyID and wordID columns are both indexed in the WordStoryLinks table as this is the main table for future searches. This table is growing fast and has quite a few million rows in it.
WordIndex
---------
wordID <PK>
wordString
WordStoryLinks <---main one for searching on
--------------
storyID <FK>
wordID <FK>
StoryDetails
------------
storyID <PK>
storyTitle
filename
:
Is this the best way to do it for the WordStoryLinks table. I'm thinking of doing some maths on this table and break it down into several tables, all roughly the same size. Of course human intervention will be required every so often for the last table, to check whether a new table is created or not to carry on the chain.
Can anyone give me some pointers on how to get a fast structure? If the structure is acceptable, then it could be my querying that's slow. How is Google so fast at searching it's database - how's it's DB structured, anyone know?
Any help grateful,
lou