Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

regex help please :-) 3

Status
Not open for further replies.

1DMF

Programmer
Jan 18, 2005
8,795
GB
Hi, hopefully a simple one.

I want to check a textarea input field data and convert to simple comma separated, the input has the usual \r\n , but it may or may not have a comma before it already and or 1 or more spaces either side.

is this right?

$address =~ s/\s+,\s+\r\n/,/g;

It's not erroring , but wanted to check it was doing what I think it is.

Regards,
1DMF.

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
nope, even tried
Code:
$row->{'Address'} =~ s/\s+,+\s+\r\n/,/g;
because it may or may not have a comma, but nothing gets replaced :-(

doesn't that say..

change..

space(0 or more) comma(0 or more) space(0 or more) carriage return

to...
comma


"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
spot on!

I just checked the perlre , I must be tired as I swear when I looked it was a plus (+) for 0 or more.

well it's nice to know my logic wasn't flawed even if I am going blind [lol]

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
I'm still having problems with this address, i've found people have done the following..
1 buckingham palace
{CRLF}
london
{CRLF}
{CRLF}
{CRLF}

I'm using..
Code:
$row->{'Address'} =~ s/$\r\n{2,}/\r\n/g; 

and also tried 

$row->{'Address'} =~ s/\z\r\n{2,}/\r\n/g;
but no joy.

How do I say change 2 or more {CRLF} at end of string and replace with just one {CRLF} ?

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
If there is "zero or none" comma you can use the '?' quantifier instead of '*' which means "zero or more", and you probably want to use "*?" for the spaces "\s*?" to make the match stingy instead of greedy

Code:
$row->{'Address'} =~ s/\s*?,?\s*?\r\n/,/g;

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Something like this might work for you:
Code:
$_ = '1 buckingham palace

london


';

s/[\x0a\x0d]+/\n/gs; # Change multiple line endings to single native line ending
s/([^\n]+)\n([^\n]+)/$1,$2/gs; #Remove line ending between text and change to comma
print "\|$_\|\n";
 
If there is "zero or none" comma you can use the '?' quantifier instead of '*' which means "zero or more", and you probably want to use "*?" for the spaces "\s*?" to make the match stingy instead of greedy

Why? I take it you mean stringy, either way why stringy not greedy, isn't that what the /g is for?

Prex-> so why does \r\n{2,} not work ? is it because that is only checking for 1 a\r then 2 or more \n's

Is that why you need to wrap them in parenthases, I thought that was only when you want to capture the matched string into the $1,$2 vars

would s/(\r\n){2,}/\r\n/g; work?



"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
yup just tried
Code:
$row->{'Address'} =~ s/(\r\n){2,}/\r\n/g;
and that did the trick.

Ok looks like the proverbial cat skinning issue.

so which is best and why?

Code:
$row->{'Address'} =~ s/(\r\n)+/\r\n/g; 

or

$row->{'Address'} =~ s/(\r\n){2,}/\r\n/g;

And I just tried
Code:
$row->{'Address'} =~ s/\w(\r\n){2,}/\r\n/g;
and that worked!

Right finally getting my head round all this, thanks guys, as usual you guys are awesome and always helpful!





"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
Cracked it with the following code..
Code:
        #format address
        if($row->{'Address'}){
            $row->{'Address'} =~ s/(\r\n)+/\r\n/g;        
            $row->{'Address'} =~ s/\s*,*\s*(\r\n)/,/g; 
            $row->{'Address'} =~ s/,$//g;            
        }

1. replace multiple {CRLF} with just one.
2. repalce any whitespace - comma or not - any whitespace and a {CRLF} with a just a comma.
3. remove extra comma at end of string.

So any perl heads out there got a one liner ;-)

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
good thinking batman! [batman]



"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
Why? I take it you mean stringy, either way why stringy not greedy, isn't that what the /g is for?

Why? Would be because if you expect zero or one of something using "?" is the correct quantifier. If you expect zero or more of something you use "*". If you are unsure you can use "*". That makes your code match what its intended behavior is and any deviations from that expected behavior can be looked into if necessary. Suppose you use "/,*//g" to remove commas from a string, this could mask a problem further up or down the pipe if there should only ever be zero or one comma. Though in your case it's probably not important.

Stringy? No. Stingy. The opposite of greedy: stingy matching means to match as little as possible instead of as much as possible. The "g" modifier has nothing to do with stingy/greedy. "g" applies changes similar to this contsruct:

Code:
while(/pattern/){
   s/pattern/replacement/;
}

In other words its a sort of loop.

Stingy matching also should be more efficient because perl stops evaluating the string once it finds the pattern. With greedy matching perl continues to evaluate even after it finds the minimum length/width of a pattern. In some cases this can introduce quite a lot of inefficiency into code.



------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Thanks for the reply Kevin.

Suppose you use "/,*//g" to remove commas from a string, this could mask a problem further up or down the pipe if there should only ever be zero or one comma.
If * means zero or more, why is it a problem if there aren't any or there is only one?

Don't I require it to be greedy not stingy as some have put multiple spaces and {CRLF} chars in between lines of the address.

1 buckingham palace{CRLF}
{CRLF}
london{WS}{CRLF}
{CRLF}
{WS}SW1A 1AA{WS}{CRLF}
{CRLF}
{CRLF}
England{CRLF}
{CRLF}
{WS}

{WS} = White Space
{CRLF} = Cariage Return Line Feed

Why? who knows, when you analyse user input it often leaves you scratching your head wondering why they entered info the way they did.

So thanks to all who helped me clean it up [thumbsup]

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top