Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

removing any special characters in a string 1

Status
Not open for further replies.

calabama

Programmer
Feb 19, 2001
180
US
Hello,
How does one remove any instances of special characters in a string.

Thanks In the begining
Let us first assume that there was nothing to begin with.
 
That depends on what you mean by special characters. If you mean nonprintable characters you could do:
[tt]$string =~ s/[[:^print:]]//g;[/tt]

[tt][:print:][/tt] is the POSIX character class of printable characters, [tt][:^print:][/tt] is it's opposite.
 
Sorry Rosenk,

I guess it would have been nice to be more specific. I guess special characters means any characters that ate not A-Za-z0-9 characters.

Thanks In the begining
Let us first assume that there was nothing to begin with.
 
In that case, you would do:
[tt]$string =~ s/[^A-Za-z0-9]//g;[/tt]

you can save some characters by using case insensitive regex instead:
[tt]$string =~ s/[^a-z0-9]//gi;[/tt]

or, if you don't mind getting _ (underscore) as well as alphanumeric, you can use \W (non-word character class):
[tt]$string =~ s/\W//g;[/tt]
 
and if you DO mind getting the underscore then you can use:
Code:
$string =~ s/[\W_]//g;
jaa
 
Thanks for your help,

I had another question though. I wanted to use the
Code:
$string =~ s/[^a-z0-9]//gi;
example.
Is there a way to adapt it to allow spaces as an exception.

Thanks again In the begining
Let us first assume that there was nothing to begin with.
 
Surely. What the regexp is doing is replacing anything that matches that which is between the first slashes with that which is in the second. The regex between the first slashes [tt][^a-z0-9][/tt] means a character not ([tt]^[/tt]) matching anything between 'a' and 'z' (case ignored because of the [tt]i[/tt] modifier at the end) or '0'-'9'. Anything that does not match this pattern gets replaced with nothing.

If you want to allow spaces you would change the left part to: [tt][^a-z0-9 ][/tt] (notice space after the 9)... or if you want to allow any space-like character (including tab): [tt][^a-z0-9\s][/tt]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top