Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Strip junk chars from an entire DB

Status
Not open for further replies.

Kalor

Programmer
Jan 22, 2002
13
AU
Hi,

I maintain the back end of a dodgy 3rd party app that I can't get the source to.
It frequently has problems with delimiter characters in string fields- quotes, commas, etc. Since I can't add verification at the front end, I need to clean it at the back.
The problem isn't limited to one or two major fields/tables, so I've ruled out triggers (performance and maintenance would be too much of a nightmare). That leaves a database crawler that can be run overnight that cleans these characters up (some are removed, some are replaced by spaces).

I've written a proc that finds char fields, then finds and removes single quotes, double quotes and commas.It works, but it's slow (4 hours- char fields aren't indexed much) and multiple passes are needed (ie it only removes one instance of the character each time).

Here's my code. Anything come to mind?
----------------------------------
/*
This code crawls the database and replaces problem characters in tables with good ones.
@strSQLUpdate is set each time due to quirks with literal quotes and space trimming.
*/

declare @intcount int
Declare @strSQLUpdate varchar(250)

declare @dirtytable varchar(20)
declare @dirtyfield varchar(20)
declare @BadChar char(1)
declare @GoodChar char(1)

select @intcount = 1

While @intcount < 4
BEGIN

DECLARE curStringField CURSOR FOR
select o.name as chartable, c.name as charfield
from sysobjects o, syscolumns c
where o.type = 'U'
and c.id = o.id
and c.xtype = 175
order by o.name, c.name

OPEN curStringField

FETCH NEXT FROM curStringField into @dirtytable, @dirtyfield

WHILE @@FETCH_STATUS = 0
BEGIN
select @BadChar = case @intcount
when 1 then &quot;&quot;&quot;&quot;
when 2 then &quot;'&quot;
when 3 then &quot;,&quot;
end

select @GoodChar = &quot;&quot;

select @strSQLUpdate = case @intcount
when 1 then &quot;UPDATE &quot; + @dirtytable + &quot; SET &quot; + @dirtyfield + &quot; = left(&quot; + @dirtyfield + &quot;,charindex('&quot; + @BadChar + &quot;',&quot; + @DirtyField + &quot;)-1) + '&quot; + ltrim(@Goodchar) + &quot;' + substring(&quot; + @dirtyfield + &quot;,charindex('&quot; + @Badchar + &quot;',&quot; + @dirtyfield + &quot;)+1,len(&quot; + @dirtyfield + &quot;)) WHERE charindex('&quot; + @BadChar + &quot;',&quot; + @DirtyField + &quot;) <> 0&quot;
when 2 then &quot;UPDATE &quot; + @dirtytable + &quot; SET &quot; + @dirtyfield + &quot; = left(&quot; + @dirtyfield + &quot;,charindex(&quot;&quot;&quot; + @BadChar + &quot;&quot;&quot;,&quot; + @DirtyField + &quot;)-1) + '&quot; + ltrim(@Goodchar) + &quot;' + substring(&quot; + @dirtyfield + &quot;,charindex(&quot;&quot;&quot; + @Badchar + &quot;&quot;&quot;,&quot; + @dirtyfield + &quot;)+1,len(&quot; + @dirtyfield + &quot;)) WHERE charindex(&quot;&quot;&quot; + @BadChar + &quot;&quot;&quot;,&quot; + @DirtyField + &quot;) <> 0&quot;
when 3 then &quot;UPDATE &quot; + @dirtytable + &quot; SET &quot; + @dirtyfield + &quot; = left(&quot; + @dirtyfield + &quot;,charindex('&quot; + @BadChar + &quot;',&quot; + @DirtyField + &quot;)-1) + '&quot; + @Goodchar + &quot;' + substring(&quot; + @dirtyfield + &quot;,charindex('&quot; + @Badchar + &quot;',&quot; + @dirtyfield + &quot;)+1,len(&quot; + @dirtyfield + &quot;)) WHERE charindex('&quot; + @BadChar + &quot;',&quot; + @DirtyField + &quot;) <> 0&quot;
end
EXECUTE (@strSQLUpdate)

FETCH NEXT FROM curStringField into @dirtytable, @dirtyfield
END

CLOSE curStringField
DEALLOCATE curStringField
select @intcount = @intcount + 1
END
---------------------------
 
You could try something along these lines, if you have SQL 2000. Create a function to encapsulate the clean-up.
Then use one update statement per table. This should dramatically reduce the amount of reading and writing that is required and therefore reduce the time.
Good Luck!

drop function clean
go
create function clean (@input varchar(8000))
returns varchar(8000)
AS
BEGIN
RETURN( REPLACE( REPLACE( REPLACE(@INPUT,'&quot;',''), '''',''), ',','') )
END
go

UPDATE TABLE1
SET COL1 = test.dbo.clean(COL1),
COL2 = test.dbo.clean(COL2),
COL2 = test.dbo.clean(COL2)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top