Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Split on the last occurrence of a character 2

Status
Not open for further replies.

CoffeeGrinder

Technical User
Joined
Jan 16, 2004
Messages
2
Location
GB
I've got a string with a varying number of .'s and I want to split on the last occurance of . to put the data into two variables without losing any other characters. How? At the moment I have a solution that uses $1 and $2, but I'd like to be able to do it with a 'simple' split.

if ($data =~ /(.*)\.(\w+)$/)
{
$child_data1=$1;
$child_data2=$2;
}
 
Try:

Code:
my $data = "127.0.0.1";

my @new_data = split /\./, $data;
my $child_data2 = pop @new_data;
my $child_data1 = join '.', @new_data;

- Rieekan
 
how about a non-regex way:-

Code:
[b]#!/usr/bin/perl[/b]

@data = qw(  127.0.0.1  abc.defg.hi.jklm  );

foreach (@data) {
  $leftside = substr($_, 0, rindex($_, "."));
  $rightside = substr($_, 1 + rindex($_, "."), length $_ - (rindex($_, ".")));
  print "$_\t$leftside | $rightside\n";
}


Kind Regards
Duncan
 
Much more elegant than mine Duncan. Have a star.

- Rieekan
 
Hmm bit concerned here.
What happens when there is no "."?
I think it's askin fir trouble.
At least the original regex could handle it with the "if" test.





(very injured) Trojan.
 
Thanks Rieekan - althought i don't know about elegant... it looks a flipping mess!!! But thank you anyway

Hi Trojan - i wasn't trying to write the whole thing - just offer an alternate way of splitting it up (gotta get these dudes learning after all)

But i guess you could just check the $leftside scalar to effect Trojan's concern


Kind Regards
Duncan
 
Duncdude,
You gotta be real careful with simple stuff like this.
your "length $_ - blah" is dangerous.
For example, I ran your code with a couple of test cases:
123.456
123.456 123 | 456
abcdefg
Argument "abcdefg" isn't numeric in subtraction (-) at 45.pl line 6, <> line 2.
abcdefg abcdef | a
[/code]
The "length" should be parenthesised in some way to avoid operator precedence issues like this.
Granted it would probably survive if it was oly supplied data with dots in but that's really not the point.
I guess my issues here are:
1) We've taken something simple and reliable and made it more cmoplicated and more unreliable. And for what? A slight speed improvement? I know which I'd rather have.
2) Always be VERY careful with operator precedence. If in doubt, bracket like mad.
3) Test with good data AND bad data. The bad data tests are usually more important.
4) If it aint broke ...

Sorry, didn't mean to drag you down but I think these issues are important.






(very injured) Trojan.
 
Hi Trojan

I'm not offended - i think you have a good point

I overlooked the string becoming a mathematical string ... and the problems this could cause

I guess i was solving most problems with regex's and then i saw that most people were agonising at the speed issues - and i started thinking of other ways to solve the problems - and in this instance it could be catastrophic

You are quite right to voice your concerns - i did not shove any data through it to prove its worth!

... which, ironically, means that the OP CoffeeGrinder's solution was as good as any!


Kind Regards
Duncan
 
I understand your concerns asbout the overuse of regex's and the performance advantages of substr, index and rindex but I do have concerns about the use of these functions too.
As always, we have to be very careful to test, test, test.




(very injured) Trojan.
 
Good grief... are we going to have to put up with you and your broken hand for long ;-)


Kind Regards
Duncan
 
This is how I'd do this. Capturing groups in the regexp you pass to split() are returned as part of the list. If there are no dots in the string, it isn't split and the entire string is returned as the first element:
Code:
my @child_data = split /\.([^.]+$)/, $str;
 
Wow, great answers - I'm really glad I asked the question.

Now, more questions/comments...

- So, substr is more efficient than using a regex. It's only a small data set in this case, but useful to know in future.
- rindex is neat, didn't know about that before
- In duncdude's answer, you need to put length $_ in brackets to calculate the position at the end of the string. But why bother doing that? If you omit the lentgh then it returns everything to the end of the string.
- ishnid - perfect, that's exactly what I was looking for :) I now have a nice little one liner:

($child_data1,$child_data2)=split/\.([^.]+$)/,;

Thanks again for your time one this one, all of you.
 
Just for fun (since it won't work if there's no dot in the string), using unpack:
Code:
my @data = qw(  127.0.0.1  abc.defg.hi.jklm  );

foreach (@data) {
  my $splitat = rindex $_, '.';
  my ( $leftside, $rightside ) = unpack "A${splitat}xA*", $_;
  print "$_\t$leftside | $rightside\n";
}

substr, index and that kind of string operator are preferable for simple tasks. If you're going to need quite a few of them to achieve what you're looking for, a regexp would be simpler to use. The difference isn't huge - I just don't like firing up the regexp engine for something that can be easily done with substr and index. That's not to say that I'd ever construct a complex function using them just to stubbornly avoid regexps.
 
ish, what do you do for kicks? ;-)

Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top