Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing Text Strings (De-Concatenate?) 1

Status
Not open for further replies.

twifosp

Programmer
Jul 25, 2003
186
US
I have some data that I want to group in order to search.

An example would look like: (just an exmaple)

Code:
id	Desc
123	Apples oranges bannanas
124	Apples Libraries 
125	Bannanas Apples
126	libraries
127	zoos

Is there a dynamic way to see how many times the words appear in the table?

I could write a case statement looking for '%apples%' and then grouping them ect, but I don't know all the terms possible and the table has quite a few records. Is there a dynamic way to have an output like:

Code:
Desc	  Count
Apples	  3
Oranges	  1
Bannanas  2
Libraries 2
Zoos 	  1



 
How many words do you expect to find in each record and approximately how many records are you talkiing about here?


-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
To keep it simple I suppose I would only care about the first 5 to start with. If I needed to go farther than that, I can expand from there. But to keep it simple for the example, I can live with 3-5.

About 100,000 rows to start with.
 
With a lot of data, I'm not sure how well this will perform. I suppose you could give it a shot and see. [smile]

I included some dummy data for testing purposes, you'll want to replace @Temp in your final query.
Code:
[green]-- Dummy Data[/green]
Declare @Temp Table(Id Integer, Data VarChar(100))

Insert Into @Temp Values(123,'Apples oranges bannanas')
Insert Into @Temp Values(124,'Apples Libraries')
Insert Into @Temp Values(125,'Bannanas Apples')
Insert Into @Temp Values(126,'libraries')
Insert Into @Temp Values(127,'zoos')

[green]-- The query[/green]
Declare @Data Table(Data VarChar(100))
Declare @Words Table(Word VarChar(20))

Insert Into @Data(Data) 
Select LTrim(Rtrim(Data)) From @Temp

While Exists(Select * From @Data Where CharIndex(' ', Data) > 0)
  Begin
    Insert Into @Words(Word)
    Select Left(Data, CharIndex(' ', Data))
    From   @Data
    Where  CharIndex(' ', Data) > 0

    Update @Data
    Set    Data = LTrim(RTrim(Right(Data, Len(Data)-CharIndex(' ', Data))))
  End

Insert Into @Words(Word) 
Select Data From @Data

Select Word, 
       Count(1) As WordCount 
From   @Words 
Group By Word 
ORder By Word

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Works fairly well when I changed the variable tables to temp tables with indexes on them. Thanks!
 
I have done the same thing here, using temp tables instead of variables, and I have a comma delimiter. However, I am counting both the word phrase and the phrase with a comma!
The results I get look like this:

member privileges 2617
Discounts 1551
information 707
Social activities 643
Health 432
Discounts, 318
information, 311
member privileges, 274
Online tools 220
Social activities, 159
Online tools, 115
The sytax is exactly the same as above, only ',' in stead of ' '.
Any ideas??
Thanks.
 
Near the end of the process, there is...

[tt][blue]Insert Into @Words(Word)
Select Data From @Data[/blue][/tt]

Change this to...

Code:
Insert Into @Words(Word) 
Select [!]Replace([/!]Data[!], ',', '')[/!] From @Data

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Sorry. I posted too soon. There is another necessary change.

Code:
-- Dummy Data
Declare @Temp Table(Id Integer, Data VarChar(100))

Insert Into @Temp Values(123,'Apples[!],[/!]oranges[!],[/!]bannanas')
Insert Into @Temp Values(124,'Apples[!],[/!]Libraries')
Insert Into @Temp Values(125,'Bannanas[!],[/!]Apples')
Insert Into @Temp Values(126,'libraries')
Insert Into @Temp Values(127,'zoos')

-- The query
Declare @Data Table(Data VarChar(100))
Declare @Words Table(Word VarChar(20))

Insert Into @Data(Data) 
Select LTrim(Rtrim(Data)) From @Temp

While Exists(Select * From @Data Where CharIndex('[!],[/!]', Data) > 0)
  Begin
    Insert Into @Words(Word)
    Select Left(Data, CharIndex('[!],[/!]', Data)[!]-1[/!])
    From   @Data
    Where  CharIndex('[!],[/!]', Data) > 0

    Update @Data
    Set    Data = LTrim(RTrim(Right(Data, Len(Data)-CharIndex('[!],[/!]', Data))))
  End

Select * From @Words

Insert Into @Words(Word) 
Select Replace(Data, ',', '') From @Data

Select Word, 
       Count(1) As WordCount 
From   @Words 
Group By Word 
ORder By Word

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Thank you so much. This has saved me a whole day of trial and error! This works great. Have a star!
 
In that case, you have my permission to take the rest of the day off. [wink]

I'm glad this has helped.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top