Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Different strings 2

Status
Not open for further replies.

rvBasic

Programmer
Oct 22, 2000
414
BE
How can I search for the first divergent character in two strings? e.g. if $sa="abcdef" en $b="abcedf" then the strings differ first in position 4.

In theory I can xor the two strings and then look for the first non x00 character (if any). In practice I can't find the right Perl combination.[ponder]

_________________________________
In theory, there is no difference between theory and practice. In practice, there is. [attributed to Yogi Berra]
 
Probably not the greatest solution but

$sa='abcdef';
$sb='abcedf';

@sa = split //, $sa;
@sb = split //, $sb;

$maxlength = $#sa < $#sb? $#sa : $#sb;

for $num (0..$maxlength) {
if ($sa[$num] ne $sb[$num]) {
print "Diffence at $num, sa=$sa[$num] sb=$sb[$num]\n";
exit;
}
}

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those Who Say It Cannot Be Done Are Usually Interrupted by Someone Else Doing It; Give the wrong symptoms, get the wrong solutions;
 
thanks travs69, that line of thought also occurred to me but I hoped there was some magical Perl way to look for the position of first non x00 character in a string.

_________________________________
In theory, there is no difference between theory and practice. In practice, there is. [attributed to Yogi Berra]
 
one possibility that should be pretty efficient:

Code:
[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$s1[/blue] = [red]"[/red][purple]abcdef[/purple][red]"[/red][red];[/red]
[black][b]my[/b][/black] [blue]$s2[/blue] = [red]"[/red][purple]abcedf[/purple][red]"[/red][red];[/red]
[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [maroon]compare_strings[/maroon][red]([/red][blue]$s1[/blue],[blue]$s2[/blue][red])[/red][red];[/red]

[url=http://perldoc.perl.org/functions/sub.html][black][b]sub[/b][/black][/url] [maroon]compare_strings[/maroon] [red]{[/red]
   [blue]@_[/blue] == [fuchsia]2[/fuchsia] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]'[/red][purple]Usage: compare_strings($s1,$s2)[/purple][red]'[/red][red];[/red]
   [black][b]my[/b][/black] [red]([/red][blue]$s1[/blue],[blue]$s2[/blue][red])[/red] = [blue]@_[/blue][red];[/red]
   [black][b]my[/b][/black] [blue]$len[/blue] = [url=http://perldoc.perl.org/functions/length.html][black][b]length[/b][/black][/url] [blue]$str1[/blue][red];[/red]
   [olive][b]for[/b][/olive] [black][b]my[/b][/black] [blue]$i[/blue] [red]([/red][fuchsia]0[/fuchsia] .. [blue]$len[/blue][red])[/red] [red]{[/red]
      [black][b]my[/b][/black] [blue]$car1[/blue] = [url=http://perldoc.perl.org/functions/substr.html][black][b]substr[/b][/black][/url] [blue]$s1[/blue], [blue]$i[/blue], [fuchsia]1[/fuchsia][red];[/red]
      [black][b]my[/b][/black] [blue]$car2[/blue] = [black][b]substr[/b][/black] [blue]$s2[/blue], [blue]$i[/blue], [fuchsia]1[/fuchsia][red];[/red]
      [olive][b]if[/b][/olive] [red]([/red][blue]$car1[/blue] ne [blue]$car2[/blue][red])[/red] [red]{[/red]
         [url=http://perldoc.perl.org/functions/return.html][black][b]return[/b][/black][/url] [blue]$i[/blue]+[fuchsia]1[/fuchsia][red];[/red]
      [red]}[/red]
   [red]}[/red]
[red]}[/red]

but it assumes both strings are the same length. If the strings are not the same length you would have to take that into account.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
or if you wanted to be terse about it:

Code:
[url=http://perldoc.perl.org/functions/sub.html][black][b]sub[/b][/black][/url] [maroon]compare_strings[/maroon] [red]{[/red]
   [blue]@_[/blue] == [fuchsia]2[/fuchsia] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]'[/red][purple]Usage: compare_strings($s1,$s2)[/purple][red]'[/red][red];[/red]
   [olive][b]for[/b][/olive] [url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$i[/blue] [red]([/red][fuchsia]0[/fuchsia] .. [url=http://perldoc.perl.org/functions/length.html][black][b]length[/b][/black][/url] [blue]$_[/blue][red][[/red][fuchsia]0[/fuchsia][red]][/red][red])[/red] [red]{[/red]
      [url=http://perldoc.perl.org/functions/return.html][black][b]return[/b][/black][/url] [blue]$i[/blue]+[fuchsia]1[/fuchsia] [olive][b]if[/b][/olive] [red]([/red] [red]([/red] [url=http://perldoc.perl.org/functions/substr.html][black][b]substr[/b][/black][/url] [blue]$_[/blue][red][[/red][fuchsia]0[/fuchsia][red]][/red], [blue]$i[/blue], [fuchsia]1[/fuchsia] [red])[/red] [b]ne[/b] [red]([/red] [black][b]substr[/b][/black] [blue]$_[/blue][red][[/red][fuchsia]1[/fuchsia][red]][/red], [blue]$i[/blue], [fuchsia]1[/fuchsia] [red])[/red] [red])[/red][red];[/red]
   [red]}[/red]
[red]}[/red]

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Try the following:
Code:
$sa='abcdef';
$sb='abcedf';
$sc="$sa"^"$sb";
$sc=~/^(\x00+)/;
$first_nomatch=length$1;
The double quotes in third line are required only if your strings might ever be numbers.

prex1
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
Franco,

I thought both operands had to be integers when using the bitwise operators.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
hmmm.... my memory is getting bad, I think..... [sad]

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
prex1: very nice solution. I changed the last line to:
Code:
$first_nomatch=(length($sa)==length($1) && length($sb)==length($1))?-1:length($1);
to get a -1 in case both strings are equal. But you'll surely come up with a shorter way of doing that.

kevin: thanks for your input. BTW you're right about the 24 hrs: there's better stuff out there.

_________________________________
In theory, there is no difference between theory and practice. In practice, there is. [attributed to Yogi Berra]
 
If the two strings are equal [tt]length$1[/tt] returns a position one character past the length of the two, but I understand your aim to clearly indicate this situation.
However [tt]$sc[/tt] in my code above has a length equal to the greater of [tt]$sa[/tt] and [tt]$sb[/tt], so you can also write
Code:
$first_nomatch=length$sc==length$1?-1:length$1;
[wink]

Franco
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
thanks again prex1. In the final(?) version I added a test to verify the existence of $1. If both strings are identical, you get a warning use of uninitialzed value in length at line .... So the code now looks like:
Code:
$sa='abcdef';
$sb='abcdef';
$sc="$sa" ^ "$sb";
$sc=~/^(\x00+)/;
$ls1=(!defined $1)?0:length$1;
$first_nomatch=length$sc==$ls1?-1:$ls1;

Intrigued by your comment about quotes and numbers I also tested this (BAD) lines of code:
Code:
$sa=123456;
$sb=123456;
$l=length($sa);
print "\n\$sa=$sa en \$sb=$sb length=$l\n";
$sc=$sa ^ $sb;
print "\$sc=$sc\n";
$sc=~/^(\x00+)/;
if (!defined $1){print"\$1 is not defined\n";}
$ls1=(!defined $1)?0:length$1;
$first_nomatch=length$sc==$ls1?-1:$ls1;
print "first no match at position: $first_nomatch\n";
which begets following output:
Code:
$sa=123456 en $sb=123456 length=6
$sc=0
$1 is not defined
first no match at position: 0
which seems to imply that the strings differ from the first position. Which is obviously untrue as you warned about.
But I try to understand and I came up with the following lines of thought: The xor does its job but the resulting x00 is turned into a character 0 ($sc prints as zero) which is x30. Therefore the match pattern hits the wall upon the first character at position zero - so, no match.
Does this make any sense?


_________________________________
In theory, there is no difference between theory and practice. In practice, there is. [attributed to Yogi Berra]
 
You need to double-quote the scalars if they are numbers.

The double quotes in third line are required only if your strings might ever be numbers.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Yes Kevin, but I try to understand what really happens when I don't follow the rules (as well as if I do respect them)

Does there exists a possibility to print the hexadecimal representation of a scalar?

_________________________________
In theory, there is no difference between theory and practice. In practice, there is. [attributed to Yogi Berra]
 
Wait, wait...
When you do bitwise xor between numbers (or more exactly what perl considers as numbers, but should be integers to get meaningful results) then you get something different: the bit representation of the numbers (16 bit long or 32 depending on system) is xor'ed and the result is a number. That's why you must positively assert the strings as such to avoid perl getting into confusion.
As to when perl considers a scalar as a number or as a string, I think that this depends on how the same scalar has been used before and of course on it being like a number or not.
But remember that perl considers as a number even something like '123abcd', with value 123, if it's used in a numeric operation: this is a gray zone for me, a small test program is in order when facing these situations.
And of course, you can print the hexadecimal representation of a scalar (should be an integer though): just use printf, sprintf (and unpack).
Concerning your warning, I must admit, I never [tt]use warnings[/tt]...[blush]


Franco
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
I still think using bitwise operators on strings is not correct. I admit I am out on a limb here, I have little practical experience using the bitwise opertaors and I am only going by what I have read.

Quotes from resources:

This is a demonstration of the bitwise operators. The bitwise operators only have meaning in the context of integers.


Both operands associated with the bitwise operator must be integers.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
OK, no premature conclusions and stick to the rules.

The bitwise operators only have meaning in the context of integers.

This seems a strange statement as bit operations are supposed to apply to bits no matter their meaning. And are integers not just a structured set of bits?



_________________________________
In theory, there is no difference between theory and practice. In practice, there is. [attributed to Yogi Berra]
 
My quote above on bitwise string operators is from the very official perlop man page: so, Kevin, be sure that it's not only legal, it's a design choice for the language.
Those operators may be very useful, e.g. for manipulating bitstrings representing images.
The shift ops are defined for numbers only, but shift operators for bitstrings would be useful also, hope they will appear soon...

Franco
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
From a well known but slightly less authoritative source: Programming Perl, Randal Schwartz et. al, 4th edition, p. 162
All of the bitwise operators can work with bitstrings, as well as with integers.
...
if any operand of a bitwise operator is a string, Perl will perform the operation on that bitstring.
...
Bitstrings may be arbitrarily long.

_________________________________
In theory, there is no difference between theory and practice. In practice, there is. [attributed to Yogi Berra]
 
Yes, I think this clears it up, for me anyway:

<quoted from perldoc>:

Bitwise String Operators

Bitstrings of any size may be manipulated by the bitwise operators (~ | & ^).

If the operands to a binary bitwise op are strings of different sizes, | and ^ ops act as though the shorter operand had additional zero bits on the right, while the & op acts as though the longer operand were truncated to the length of the shorter. The granularity for such extension or truncation is one or more bytes.

# ASCII-based examples
print "j p \n" ^ " a h"; # prints "JAPH\n"
print "JA" | " ph\n"; # prints "japh\n"
print "japh\nJunk" & '_____'; # prints "JAPH\n";
print 'p N$' ^ " E<H\n"; # prints "Perl\n";

If you are intending to manipulate bitstrings, be certain that you're supplying bitstrings: If an operand is a number, that will imply a numeric bitwise operation. You may explicitly show which type of operation you intend by using "" or 0+ , as in the examples below.

$foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
$foo = '150' | 105; # yields 255
$foo = 150 | '105'; # yields 255
$foo = '150' | '105'; # yields string '155' (under ASCII)

$baz = 0+$foo & 0+$bar; # both ops explicitly numeric
$biz = "$foo" ^ "$bar"; # both ops explicitly stringy

<end quote>

but this prints nothing:

print "abcde" ^ "abced";

I'm confused....

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top