Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Rhinorhino on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing out text from large text field 1

Status
Not open for further replies.

jimoo

Programmer
Joined
Jun 2, 2003
Messages
1,111
Location
US
I am looking for a solution to parse comments out of a large text field that has a lot of RTF special characters in it.

- The comments all begin with\dn0 (plus a space)
- The comment ends with }
- There can be one or more comments
- Would like to add a carriage return / line feed at the end of each comment.
- Will need to do this for each comment record in the data table.

I suspect I need some kind of loop

Here is a mock example:

Code:
declare @thetext1 varchar(5000), @thetext2 varchar(5000)
set @thetext1 = '{\rtf1\ansi\dn0 COMMENT1 IS Here}\fonttbl{\f0\dn0 COMMENT2 IS HERE}froman Tms \colortbl\red0\ dm0 COMMENT3 IS HERE }green0\blue0}}'
set @thetext2 = SUBSTRING(@thetext1,CHARINDEX('dn0 ', @thetext1)+4,200000)
select @thetext2


The desired results are:
Code:
COMMENT1 IS HERE
COMMENT2 IS HERE
COMMENT3 IS HERE


Jim
 
jimoo,

Is this a typo?

[tt][!]\ dm0 [/!]COMMENT3 IS HERE }[/tt]

It appears as though this comment does not have the proper opening tag. There is a space after the \ and it's dm0 instead of dn0.


-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
I suggest you create a user defined function to strip the comments out of your RTF. First, copy/paste this to a new query window and run it.

Code:
Create Function dbo.StripRTFComments(@Data VarChar(8000), @Start VarChar(50), @End VarChar(50))
Returns VarChar(8000)
As
Begin
	Declare @Output VarChar(5000)

	Set @Output = ''

	While CharIndex(@Start, @Data) > 0
	Select	@Output = @Output + SubString(@Data, PatIndex('%' + @Start + '%', @Data) + DataLength(@Start),
						CharIndex(@End, @Data, PatIndex('%' + @Start + '%', @Data))-PatIndex('%' + @Start + '%', @Data) - DataLength(@Start)) + Char(13) + Char(10),
			@Data = SubString(@Data, CharIndex(@End, @Data) + DataLength(@End), 8000)

	Return @Output
End

Then, you can use it like this...

Code:
Select [!]Top 10[/!] dbo.StripRTFComments(YourColumnNameHere, '\dn0 ','}') As Comments
From   YourTableNameHere

I added the TOP 10 part so you could test this on just 10 rows of data. With 500,000 rows, this may not be the fastest code to run, ya know?

Let me know if you have any questions.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
George - thanks a lot. This is real close, but it duplicated some of the comments when parsing.

Actual Data
Code:
{\rtf1\ansi\deff0{\fonttbl{\f0\froman Tms Rmn;}}{\colortbl\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue127;\red0\green127\blue127;\red0\green127\blue0;\red127\green0\blue127;\red127\green0\blue0;\red127\green127\blue0;\red127\green127\blue127;\red192\green192\blue192}{\info{\creatim\yr2008\mo11\dy12\hr10\min35\sec33}{\version1}{\vern262367}}\paperw12240\paperh15840\margl360\margr0\margt239\margb0\deftab720\pard\ql{\f0\fs20\cf0\up0\dn0 LTR FOR SPECIAL ED INFO. 11-12-08 JS.}{\par}\pard\ql{\f0\fs20\cf0\up0\dn0\par}\pard\ql{\f0\fs20\cf0\up0\dn0 2ND LTR FOR SPECIAL ED LTR.  SYLLABI AND COURSE CHART DO NOT IDENTIFY }{\par}\pard\ql{\f0\fs20\cf0\up0\dn0 THE SPECIAL ED CATEGORIES HER SPECIAL ED COURSES FOCUS ON.  STILL NEED}{\par}\pard\ql{\f0\fs20\cf0\up0\dn0 LTR FROM UNIV. (ALSO VERIFIED SHE QUALIFIES FOR EL ED) 11-25-08 JS.}{\par}\pard\ql{\f0\fs20\cf0\up0\dn0\par}\pard\ql{\f0\fs20\cf0\up0\dn0 PROV EVAL, NO T2EA 2-2-09 JS.}{\par}\pard\ql{\f0\fs20\cf0\up0\dn0\par}\pard\ql{\f0\fs20\cf0\up0\dn0 ISSUE CERT NOW LT. (BMH 5/20/09)}}


Results
Code:
LTR FOR SPECIAL ED INFO. 11-12-08 JS.  LTR FOR SPECIAL ED INFO. 11-12-08 JS.  LTR FOR SPECIAL ED INFO. 11-12-08 JS.  LTR FOR SPECIAL ED INFO. 11-12-08 JS.  LTR FOR SPECIAL ED INFO. 11-12-08 JS.  LTR FOR SPECIAL ED INFO. 11-12-08 JS.  LTR FOR SPECIAL ED INFO. 11-12-08 JS.  LTR FOR SPECIAL ED INFO. 11-12-08 JS.  2ND LTR FOR SPECIAL ED LTR.  SYLLABI AND COURSE CHART DO NOT IDENTIFY   2ND LTR FOR SPECIAL ED LTR.  SYLLABI AND COURSE CHART DO NOT IDENTIFY   2ND LTR FOR SPECIAL ED LTR.  SYLLABI AND COURSE CHART DO NOT IDENTIFY   THE SPECIAL ED CATEGORIES HER SPECIAL ED COURSES FOCUS ON.  STILL NEED  THE SPECIAL ED CATEGORIES HER SPECIAL ED COURSES FOCUS ON.  STILL NEED  LTR FROM UNIV. (ALSO VERIFIED SHE QUALIFIES FOR EL ED) 11-25-08 JS.  LTR FROM UNIV. (ALSO VERIFIED SHE QUALIFIES FOR EL ED) 11-25-08 JS.  PROV EVAL, NO T2EA 2-2-09 JS.  PROV EVAL, NO T2EA 2-2-09 JS.  PROV EVAL, NO T2EA 2-2-09 JS.  ISSUE CERT NOW LT. (BMH 5/20/09)  ISSUE CERT NOW LT. (BMH 5/20/09)  ISSUE CERT NOW LT. (BMH 5/20/09)


Observer this comment for example:
LTR FOR SPECIAL ED INFO. 11-12-08 JS.


Notice the comment only appears once in the actual data, but is show several times in the output.

Jim


Jim
 
Ugh. It took me 2 minutes to figure out what the problem was, and then 20 minutes of pulling my hair out to figure out why I couldn't validate the results.

Code:
Alter Function dbo.StripRTFComments(@Data VarChar(8000), @Start VarChar(50), @End VarChar(50))
Returns VarChar(8000)
As
Begin
	Declare @Output VarChar(5000)

	Set @Output = ''

	While CharIndex(@Start, @Data) > 0
	Select	@Output = @Output + SubString(@Data, PatIndex('%' + @Start + '%', @Data) + DataLength(@Start),
						CharIndex(@End, @Data, PatIndex('%' + @Start + '%', @Data))-PatIndex('%' + @Start + '%', @Data) - DataLength(@Start)) + Char(13) + Char(10),
			@Data = SubString(@Data, CharIndex(@End, @Data, [!]PatIndex('%' + @Start + '%', @Data)[/!]) + DataLength(@End), 8000)

	Return @Output
End

It was killing me because I set my output to text (instead of grid) and it kept cutting it off, until I realized that is was SSMS that was doing it.

Anyway.... this function should give you better results.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks George. I only wish I could give you ten stars instead of just one.

I greatly appreciate you taking the time write this function.



Jim
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top