Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Shaun E on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

checking for repeating string and replacing 2

Status
Not open for further replies.

cleansedbb

Technical User
Feb 11, 2002
95
US
I have this right now and it checks for character spacing and replaces with a predefined string.


$FORM{'message'} = "Line to long, please limit to 64 chars" if $FORM{'message'} =~ /\S{64}/;

my question is can you have it check for repeating strings.

i.e a person types in 1234567890 1234567890 1234567890 it will delete the last two+ successions of the string and only post the first.

 
This will clear any adjacent words found anywhere in the string.
Code:
$string  =~ s/(\S+)(?:\s+\1)+/$1/g;
If the words are the only words in the string then you can add a '^' to the beginning and a '$' to the end and get rid of the /g.
Code:
$string  =~ s/^(\S+)(?:\s+\1)+$/$1/;
jaa
 
will that remove any words period?

i ask because the string can be:

"hello how are you" or "hello hello hello" the first I want to allow the second I want to just show "hello" and remove the other two.
 
Yes, this regex will do this. Here's a test:
Code:
my @teststrings = ("hello how are you", "hello hello how are you", "hello hello hello");

for $string (@teststrings) {

    $string  =~ s/(\S+)(\s+\1)+/$1/g;
    print "$string\n";
}
which prints out
Code:
hello how are you
hello how are you
hello
The \1 in the regex says to only match the preceding captured word, the (\S+), not any word.

jaa
 
I used the $string =~ s/(\S+)(\s+\1)+/$1/g; and it works to a degree only problem I found so far is if the string is say

Whats sup it will remove the letter in the next word so it's

whats up

makes it kind of interesting, is there a way to avoid that?
 
If you only want to match and remove whole words, I believe the following would work:

[tt]$string =~
s/((?<!\S)\S+)(\s+\1(?!\S))+/$1/g;
[/tt]

With this,
$string = &quot;whats sup&quot; => &quot;whats sup&quot;
$string = &quot;whats whats whats sup&quot; => &quot;whats sup&quot;

The additions:
[tt](?<!\S)[/tt] is saying only match if the preceding character is not a non-space character. (You cannot use the opposite -- (?<=\s) -- which would say the preceding character IS a space character, because that fails at beginning of string.

[tt](?!\S)[/tt] says that the following matches only match if the character following them is not a non-space character. Similar to before, you could not use the opposite (?=\s) saying that the following character is a space character because that would fail at the end of the string.
 
I thought about this after I posted. You can explicitly specify to match at word boundaries using \b. So the following regex would work:
Code:
$string  =~ s/(\b\S+)(\s+\1\b)+/$1/g;
jaa
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top