Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing Text 1

Status
Not open for further replies.

twifosp

Programmer
Jul 25, 2003
186
US
I can't get my head around to parse this data. I've got a column that contains dirty regional data stored as text. I want to split it up so I can make it clean.

examples:

World|USA|New York|New York City|Manhattan|Central Park North
World|USA|New York|Hamptons|Bridgehampton
World|Mexico|Mexican Riviera|Yelapa


As you can see, the number of |'s per row can differ.

How could I write a function that parses the text between each | | ?
 
Do you have a primary key associated with each row?
What do you want the data to ultimately look like?



-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
And...

What data type is this column?
What version of SQL Server are you using?

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
The data type is varchar
sql 2000

The data should ultimately look like a table that has 6 different columns (that's the max number of |)

It can just be region_desc_1, region_desc_2 .... 6 for right now.

The root of the problem is I have some other data I am attemping to join to this table. I need to join based on city. So I need to get the city out of this string. But I figure I might as well clean up this data I inherieted and make a proper region lookup table while I'm at it.

 
Sorry, forgot to answer a question. Yes there is a primary key as an integer.
 
Here is what I have so far, sorry for the messy code:

Code:
Declare @data Table(data varchar(100), region1 VarChar(100), country varchar(100), state varchar(100), city varchar(100),
			city_desc1 varchar(100), city_desc2 varchar(100) )
Declare @parse Table(Word VarChar(20))

Insert Into @data(Data) 
Select top 1 LTrim(Rtrim(reg_names_string_en)) From c_nodes

select * from @data

--While Exists(Select * From @data Where CharIndex('|', Data) > 0)
--  Begin
--    Insert Into @parse(Word)
    Select Left(Data, CharIndex('|', Data))
    From   @data
    Where  CharIndex('|', Data) > 0

    Update @data
    Set    Data = LTrim(RTrim(Right(Data, Len(Data)-CharIndex('|', Data))))
--  End

Basically I'm attempting to write a while loop that will parse the string.

The while loop works, it takes out the right part of the string, and updates the @data table with the new truncated string. What I can't figure out is the logic to update the columns with the right section of the data.

I think there is probably an easier way to sovle this without a messy while loop, so if I'm way off, just ignore this post and please enlighten me :)
 
Actually... Since you are trying to clean up your data, there's really nothing wrong with messy code. I mean... if you have code that is running often, then it's important for it to be fast and maintainable. You're thought process was very similar to mine. However, since you want this data in a de-normalized format (repeating columns), I don't think you will be able to use a while loop. If your max loop value is 6, then you can just run 'the query' 6 times, with a different target column each time. Like this...

Code:
Declare @Temp Table
	(Data VarChar(1000), 
	Region1 VarChar(50), 
	Region2 VarChar(50), 
	Region3 VarChar(50), 
	Region4 VarChar(50), 
	Region5 VarChar(50), 
	Region6 VarChar(50))

Insert into @temp(Data) Values('World|USA|New York|New York City|Manhattan|Central Park North')
Insert into @temp(Data) Values('World|USA|New York|Hamptons|Bridgehampton')
Insert into @temp(Data) Values('World|Mexico|Mexican Riviera|Yelapa')

Update @Temp
Set    Region1 = Left(Data, Case When CharIndex('|', Data) > 0 Then CharIndex('|', Data) - Sign(CharIndex('|', Data)) Else Len(Data) End),
       Data = Right(Data, Len(Data)-Case When CharIndex('|', Data) > 0 Then CharIndex('|', Data) Else Len(data) End)

Update @Temp
Set    Region2 = Left(Data, Case When CharIndex('|', Data) > 0 Then CharIndex('|', Data) - Sign(CharIndex('|', Data)) Else Len(Data) End),
       Data = Right(Data, Len(Data)-Case When CharIndex('|', Data) > 0 Then CharIndex('|', Data) Else Len(data) End)

Update @Temp
Set    Region3 = Left(Data, Case When CharIndex('|', Data) > 0 Then CharIndex('|', Data) - Sign(CharIndex('|', Data)) Else Len(Data) End),
       Data = Right(Data, Len(Data)-Case When CharIndex('|', Data) > 0 Then CharIndex('|', Data) Else Len(data) End)

Update @Temp
Set    Region4 = Left(Data, Case When CharIndex('|', Data) > 0 Then CharIndex('|', Data) - Sign(CharIndex('|', Data)) Else Len(Data) End),
       Data = Right(Data, Len(Data)-Case When CharIndex('|', Data) > 0 Then CharIndex('|', Data) Else Len(data) End)

Update @Temp
Set    Region5 = Left(Data, Case When CharIndex('|', Data) > 0 Then CharIndex('|', Data) - Sign(CharIndex('|', Data)) Else Len(Data) End),
       Data = Right(Data, Len(Data)-Case When CharIndex('|', Data) > 0 Then CharIndex('|', Data) Else Len(data) End)

Update @Temp
Set    Region6 = Left(Data, Case When CharIndex('|', Data) > 0 Then CharIndex('|', Data) - Sign(CharIndex('|', Data)) Else Len(Data) End),
       Data = Right(Data, Len(Data)-Case When CharIndex('|', Data) > 0 Then CharIndex('|', Data) Else Len(data) End)

Select * from @Temp

In this code, you will see me use the Sign function. That's just a trick so that I didn't need to write another case statement.

If the delimiter is on the data, you want to use CharIndex -1. Sign returns -1, 0, or 1 depending on the sign of the data. I'm not concerned about -1 (can't happen with CharIndex), but if charindex returns 10, the sign function will return 1.

Make sense?

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top