Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

splitting a sentence

Status
Not open for further replies.

srd1984

Programmer
Dec 12, 2004
11
GB
I deally I want to split a paragraph, but im gonna start off small and split a sentence into words. After this, i want to associate each of the words to that sentence so i could later on say hover over it and i would know the original sentence it came from or somethin like that.

so i have used:

string [] words = s.Split(' ');

with s being the sentence. How do I then put each of these words into an element in the arraylist? and then associate it to the sentence s. As in how do i put both parts into one elemnt in the arraylist.

Then later how do i extract this, so say if i highlight a word and do some event it will tell me where this word came from

Thanks for any advice.
 
Parsing text is hard work. Some things you might want to consider in your design:[ol][li]Excluding a list of common words like 'and', 'the', 'a', 'to' etc. Then when you hover over it, you don't get a list of all the paragraphs containing the word 'the' (typically all of them).[/li][li]Think about normalising the text in some way before you index it - lowercase it all, strip out any punctuation, maybe even remove trailing s's. Then if you hover over 'Difficult' you would also find 'difficult'.[/li][li]You may be better off going straight for paragraphs, as they are easier to parse - typically separated by one or more linefeeds and spaces. Sentences on the other hand, start with capital letters (except in the case of trade names that start with lower case), can be terminated by ?, !, full stops (points) but not decimal points in the middle of numbers. And dealing with a sentence that may have an abbreviation at the end like this e.g. Or not, is even worse. I'd stick with paragraphs...[/li][/ol]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top