Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Function to remove ALL non-Ascii characters

Status
Not open for further replies.

DayLaborer

Programmer
Jan 3, 2006
347
US
I need to remove all non-Ascii chars from a string. So I tried the function that I pasted down below, but it doesn't exactly work.

[tt]select dbo.fnRemoveNonLowAsciiCharacters('This character – is not really a hyphen')[/tt]
will return
'This character is not really a hyphen'

But if I run this:
[tt]select dbo.fnRemoveNonLowAsciiCharacters('?')[/tt]
it returns
'F'

How can I fix this function to truly strip-out all non-ascii characters?

Code:
ALTER FUNCTION dbo.fnRemoveNonLowAsciiCharacters (@OldString as varChar(2000))
RETURNS varChar(2000) AS  
BEGIN 
DECLARE @Count integer, @NewString varChar(2000)
SET @Count =1
SET @NewString = ''
WHILE @Count <= Len(@OldString)
BEGIN
	--If the character is not a regular ascii character it will be removed
	IF  ASCII(SUBSTRING(@OldString,@Count,1)) < 127
		BEGIN
			SET @NewString  = @NewString + SUBSTRING(@OldString,@Count,1)
		END
	SET @Count = @Count + 1
END
RETURN @NewString
END
Thanks!
Eliezer
 
Maybe this...

Code:
[COLOR=blue]Alter[/color] [COLOR=#FF00FF]FUNCTION[/color] dbo.fnRemoveNonLowAsciiCharacters (@OldString [COLOR=blue]as[/color] [COLOR=blue]varChar[/color](2000))
RETURNS [COLOR=blue]varChar[/color](2000) [COLOR=blue]AS[/color]  
[COLOR=blue]BEGIN[/color] 
[COLOR=blue]DECLARE[/color] @Count [COLOR=blue]integer[/color], @NewString [COLOR=blue]varChar[/color](2000)
[COLOR=blue]SET[/color] @Count =1
[COLOR=blue]SET[/color] @NewString = [COLOR=red]''[/color]
[COLOR=blue]WHILE[/color] @Count <= Len(@OldString)
[COLOR=blue]BEGIN[/color]
    [COLOR=green]--If the character is not a regular ascii character it will be removed
[/color]	[COLOR=blue]If[/color] [COLOR=#FF00FF]SUBSTRING[/color](@OldString,@Count,1) Like [COLOR=red]'[a-zA-Z ]'[/color]
        [COLOR=blue]BEGIN[/color]
            [COLOR=blue]SET[/color] @NewString  = @NewString + [COLOR=#FF00FF]SUBSTRING[/color](@OldString,@Count,1)
        [COLOR=blue]END[/color]
    [COLOR=blue]SET[/color] @Count = @Count + 1
[COLOR=blue]END[/color]
[COLOR=blue]RETURN[/color] @NewString
[COLOR=blue]END[/color]

This function only allows a -> z, A -> Z and the space character.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
George, thanks very much!

But...
I'm really looking to include all "normal", i.e. non-Unicode, characters. I need to allow hyphens, ampersands, etc. Just not "weird" Unicode characters...

Thanks!
Eliezer
 
DayLaborer, I've just tried your function exactly as you have written it and it seems to work perfectly. I wonder if there is some other issue (perhaps code page / language / collation settings).


Hope this helps.

[vampire][bat]
 
This character
?
for example, gets turned into a "F" by the function...
 
Then try changing this line from George
Code:
If SUBSTRING(@OldString,@Count,1) Like '[a-zA-Z ]'
to
Code:
If SUBSTRING(@OldString,@Count,1) Like '[ -~]'
That's space-dash-tilde inside the brackets. Whether this works might depend on your collation sequence.
 
Ok, that won't work, but this will get you everything EXCEPT the brackets:
Code:
If SUBSTRING(@OldString,@Count,1) Like '[ -Z^-~\]'

(Note to self: test first, post later.)

George?
 
Not exactly...
Code:
select dbo.fnASCIICharOnly('Hello?¶???***')
returns Hello+¶O)n***
 
My funny characters didn't post properly... They were just a bunch of random ALT-number-pad characters...
 
I think I see the problem.

In your original post, the parameter to the function was declared as:

@OldString as varChar(2000)

Since this is a varchar, and you want to remove the unicode character, SQL Server will do an implicit conversion for you, so that all characters within the string WILL be ascii (and not unicode). In order to pass a unicode string, you need to use [!]n[/!]varchar.

So, using the function from your original post, change the data type from varchar to [!]nvarchar[/!]

Also, you are checking to see if the ASCII value is less then 127. Instead, check to see if the UNICODE value is less then 127.

Code:
[COLOR=blue]ALTER[/color] [COLOR=#FF00FF]FUNCTION[/color] dbo.fnRemoveNonLowAsciiCharacters (@OldString [COLOR=blue]as[/color] nVarChar(2000))
RETURNS [COLOR=blue]varChar[/color](2000) [COLOR=blue]AS[/color]  
[COLOR=blue]BEGIN[/color] 
[COLOR=blue]DECLARE[/color] @Count [COLOR=blue]integer[/color], @NewString [COLOR=blue]varChar[/color](2000)
[COLOR=blue]SET[/color] @Count =1
[COLOR=blue]SET[/color] @NewString = [COLOR=red]''[/color]
[COLOR=blue]WHILE[/color] @Count <= Len(@OldString)
[COLOR=blue]BEGIN[/color]
    [COLOR=green]--If the character is not a regular ascii character it will be removed
[/color]    [COLOR=blue]IF[/color]  UNICODE([COLOR=#FF00FF]SUBSTRING[/color](@OldString,@Count,1)) < 127
        [COLOR=blue]BEGIN[/color]
            [COLOR=blue]SET[/color] @NewString  = @NewString + [COLOR=#FF00FF]SUBSTRING[/color](@OldString,@Count,1)
        [COLOR=blue]END[/color]
    [COLOR=blue]SET[/color] @Count = @Count + 1
[COLOR=blue]END[/color]
[COLOR=blue]RETURN[/color] @NewString
[COLOR=blue]END[/color]

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
George, I really appreciate your efforts. But unfortunately, it's still converting my funny characters instead of stripping them...
 
OK. Let's dig a little deeper.

Run this...

Select UNICODE('FunnyLittleCharacter')

Post the results back here.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Interesting...
Select UNICODE('?') -- which looks like a right-angled thingy returns "43"
and gets converted to a "+"

and Select UNICODE('+') --an actual "plus" character - also evaluates to "43"

Huh?!
 
The unicode character for the greek omega symbol is 937. So, when testing this in query analyzer, if you do....

Code:
Select 'Hello ? World', dbo.fnRemoveNonLowAsciiCharacters('Hello ? World')

You will see that the omega character is NOT removed from the string. If you want to test hardcoded unicode strings, you need to put an N in front of the string. Like this...

Code:
select [!]N[/!]'Hello ? World', dbo.fnRemoveNonLowAsciiCharacters([!]N[/!]'Hello ? World')



-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top