Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Different strings 2

Status
Not open for further replies.

rvBasic

Programmer
Oct 22, 2000
414
BE
How can I search for the first divergent character in two strings? e.g. if $sa="abcdef" en $b="abcedf" then the strings differ first in position 4.

In theory I can xor the two strings and then look for the first non x00 character (if any). In practice I can't find the right Perl combination.[ponder]

_________________________________
In theory, there is no difference between theory and practice. In practice, there is. [attributed to Yogi Berra]
 
It prints nothing 'visible' perhaps. The result of [tt]"abcde"^"abced"[/tt] is a string composed of three [tt]\x00[/tt] and two [tt]\x01[/tt]. Try this:
[tt]print"*","abcde"^"abced","*";[/tt]
or this
[tt]map{print ord," "}split(//,"abcde"^"abced");[/tt]

Franco
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
Kevin, in my opinion it shouldn't print anything as
"abcde" = x'6162636465'
"abced" = x'6162636564'
and the resulting xor is
"abcde" ^ "abced" = x'0000000101"
which doesn't contain any printable characters.

It does get confusing however if you mix numbers and strings in the same operation, as your excerpt from perldoc warns about: make sure that both operands are of the same type. e.g:
$stra=123;
$strb="abc";
then $stra^$strb=x'505050'
because 123 is converted to a string '123' and is no longer considered as a number.

_________________________________
In theory, there is no difference between theory and practice. In practice, there is. [attributed to Yogi Berra]
 
some light is beginning to shine in the darkness [smile]

But honestly, I think my suggestion is better for comparing strings to find the first unequal character. The XOR operator might have a speed advantage but it certainly is a lot less clear about what is going on.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
prex1: you should possibly update the FAQ to get rid of unwanted warnings. [ponder]

_________________________________
In theory, there is no difference between theory and practice. In practice, there is. [attributed to Yogi Berra]
 
Done.
I used your suggestion of returning -1 when the two strings are equal. I also tried tu use a more standard style of writing than what I'm accustomed to...[blush]
Please try and rate it!


Franco
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
prex1: Is it possible to use
Code:
return (undef);
when one or both parameters are undefined? Although I see your point, I'm a bit uneasy to return the same value (-1) when both strings are equal.

The line
Code:
return 0 if ord($sa) != ord($sb);
does not seem to be necessary, as it will also be captured by the RegExp. lenght($S1) will be zero when the first characters are different. Unless it's there for (slight?) performance reasons.

_________________________________
In theory, there is no difference between theory and practice. In practice, there is. [attributed to Yogi Berra]
 
faq219-6748: I may settle on
Code:
sub first_nomatch{
  my($sa,$sb)=@_;
  return(undef) unless $sa || $sb;
  return 0 unless $sa && $sb;
  return 0 if ord($sa) != ord($sb);
  return -1 if $sa eq $sb;
  my$sc="$sa" ^ "$sb";
  $sc=~/^(\x00+)/;
  return length($1);
}
If both arguments are undefined, the result is undefined. If only one of them is undefined, then they clearly differ in the first position, so return 0.

But I still ponder the relative merits of
Code:
return -1 if $sa eq $sb;
before the regexp vs.
Code:
$ls1=(!defined $1)?0:length$1;
$first_nomatch=length$sc==$ls1?-1:$ls1;
after the check. I don't know.



_________________________________
In theory, there is no difference between theory and practice. In practice, there is. [attributed to Yogi Berra]
 
The line [tt]return 0 if ord($sa) != ord($sb);[/tt] is required because the warning for $1 being undefined occurs when the strings are different on the first char, not when they are all equal.
Concerning the [tt]return undef[/tt] I ndon't like it, as it will be returned also when both strings are of length zero (so they are not undefined) and even when they are equal to '0', as perl considers a string as false when it is so.
BTW returning -1 when the two strings are equal is not really necessary, as the condition of equality may be easily detected anyway, as the returned position will be one char past the length of the strings, and if the strings are of equal length this can occur only if they are equal.
The test [tt]return -1 if $sa eq $sb;[/tt] should be more efficient if one expects to have often two equal strings. If this is quite occasional, the other way should be better.
I think that, as is always so, it is a matter of taste and after all one can adapt the code to its own needs.

Franco
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
What if you replace the single line
Code:
return(undef) unless $a||$b;
with the two lines
Code:
return (undef) if (!defined $a)&&(!defined $b);
return (-1) unless $a||$b;

In this case you satisfy both needs: you tell the caller explicitly that both strings are undefined and that you can't make any conclusions. On the other hand if they are of zero length or just plain zero then you rightly consider them equal.

Of course it's one more line.

_________________________________
In theory, there is no difference between theory and practice. In practice, there is. [attributed to Yogi Berra]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top