Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

regex: replace all special characters 1

Status
Not open for further replies.

rao1soft

Programmer
Joined
Nov 7, 2002
Messages
17
Location
US
Hi,

I'm trying to replace all special characters (space, comma, parens) within double quotes in a string with an underscore using:
Code:
while(<>) {
 s/"(.*)[ ,:\]\[\(\)](.*)"/"$1_$2"/g;
print "$_\n";
}
The following text
Code:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<cleanup"a string,here with(junk">
<cleanup"a string,here[with junk">
<cleanup"a string)here:with,junk">
<cleanup"a string(here:with[junk">
<cleanup"a string,here:with)junk">
<cleanup"a string:here]with]junk">
</root>
gets me
Code:
<?xml version="1.0"_encoding="UTF-8"?>
<root>
<cleanup"a string,here with_junk">
<cleanup"a string,here[with_junk">
<cleanup"a string)here:with_junk">
<cleanup"a string(here:with_junk">
<cleanup"a string,here:with_junk">
<cleanup"a string:here]with_junk">
</root
But only the last one is being substituted. How do I get to sub all the occurrences? Also, if there are two sets of double quotes (as in line 1) how do I process each set separately?

Thanks.
 
It's got to do with the greediness of RE.

Try using .*?

The ? means "use the shortest match".
 
Then only the first match is replaced!
 
I had this sub already written, which seems to do the trick:
Code:
#!perl
use strict;
use warnings;

while (<DATA>) {
    print rpldelimxy($_, "\"", qr([ ,:\]\[\(\)]), "_");
}

[b]sub rpldelimxy {
    # Replace $x within $delim's with $y.
    # $x and $delim may be regexes. $y must be a string.
    my ($str, $delim, $x, $y) = @_;
    my $strlen = length($str);
    my $sawdelim = 0;
    for (my $i=0; $i<$strlen; $i++) {
        my $s = substr($str, $i, 1);
        if ($s =~ $delim) {
            $sawdelim = $sawdelim? 0: 1;
        }
        if ($s =~ /$x/ && $sawdelim) {
            substr($str, $i, 1) = $y;
        }
    }
    return $str;
}[/b]

__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<root>
<cleanup"a string,here with(junk">
<cleanup"a string,here[with junk">
<cleanup"a string)here:with,junk">
<cleanup"a string(here:with[junk">
<cleanup"a string,here:with)junk">
<cleanup"a string:here]with]junk">
</root>
Output
Code:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<cleanup"a_string_here_with_junk">
<cleanup"a_string_here_with_junk">
<cleanup"a_string_here_with_junk">
<cleanup"a_string_here_with_junk">
<cleanup"a_string_here_with_junk">
<cleanup"a_string_here_with_junk">
</root>

 
Hmm, in rpldelimxy, that really should be
if ($s =~ /$delim/)
(Note the slashes around $delim.)
Seems to work without 'em, but why ask for trouble. (It always finds you anyway.) :-)
 
Here's an improved version of the routine. This allows $delim, $x, and $y to contain multiple characters, which the earlier version did not. There they all had to be single chars for the routine to work correctly. If you pass an empty string in $y, $x will be deleted within delims.

This version worked well with light testing.

Note the CAVEATS. There's currently no check to make sure an opening delimiter has a corresponding ending delimiter. (Also true of the earlier version.) I think I want to think about this a bit more before I do anything about it.
Code:
sub rpldelimxy {
    # Replace $x within $delim's with string $y.
    # $delim and $x may be strings or regexes.  $y must be a string.
    # Passing an empty string in $y will delete $x within $delims.
    # CAVEATS: Currently no check to see if delimiters are paired.
    #    Once we see an opening $delim, we keep replacing $x with $y 
    # until we see a closing $delim ... 
    my ($str, $delim, $x, $y) = @_;
    my $sawdelim = 0;
    my $leny = length($y);
    my $i = 0;
    while ($i < length($str)) {
        my $s = substr($str, $i);
        if ($s =~ /^($delim)/) {
            $sawdelim = $sawdelim? 0: 1;
            $i += length($1);
        } elsif ($s =~ /^($x)/ && $sawdelim) {
            my $lenx = length($1);
            my $temp = substr($str, 0, $i) . $y;
            if ($i < length($str) - $lenx) {
                $temp .= substr($str, $i + $lenx);
            }
            $str = $temp;
            $i += $leny;
        } else {
            $i++;
        }
    }
    return $str;
}
 
Is there a stack module or representation in perl at all? If so, it would be pretty easy to test if the values are paired.

--Chessbot

There is a level of Hell reserved for probability theorists in which every monkey that types on a typewriter produces a Shakespearean sonnet.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top