Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Best way to add a space between touching Upper & Lowercase Characters?

Status
Not open for further replies.

PerlNewUser

Technical User
Apr 17, 2008
9
US
Using something like this:

Code:
(pseudocode) 

$line=~s/lcUC/lc  uc/g;

I want to add one or more spaces (and perhaps a period) between upper and lower case characters that touch each other, i.e.:

This is a blue widgetThis is a red widgetThis is a brown widget

should be:

Thus is a blue widget. This is a red widget. This is a brown widget

I have an Excel file that was exported as delimited text. One of the record fields contains product descriptions with sentences that run-on and connect to each other because much of the information was originally in bulleted lists that, after export, were converted into single, run-on sentences.

If there were periods or other common punctuation at the end of each sentence, I would be able to easily use them in the above pseudocode to accomplish what I need; but with just lc and uc characters touching, I am not sure of the best way to proceed, since Perl offers multiple options.

Thanks.

 
Code:
$_ = "This is a brown widgetThis is a red widgetThis is a green widget";
s/([a-z])([A-Z])/\1\. \2/g;
print;
 
Brigmar, thanks.

What do the 1 and 2 represent? So far, this is not working for me when used as:

$line=s/([a-z])([A-Z])/\1\. \2/g;
 
The parentheses in regexes capture their contents to $1, $2 etc.
Within a regex command they can be referred to as \1, \2 etc

As for not working, you're not applying the substitution to the $line value. Strange, as you had the syntax in your original post.

Code:
$line = "This is a brown widgetThis is a red widgetThis is a green widget";
$line [b]=~[/b] s/([a-z])([A-Z])/\1\. \2/g;
print $line;
 
No, he means that in your original post you were using the correct =~ operator, but in your second post you had the assignment operator = instead...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
That was included in the code, but omitted when I copied it to the above reply.

This is what I have:

Code:
 $line =~ s/([a-z])([A-Z])/\1\. \2/g;

I also applied it with altenate characters, i.e.

Code:
 $line =~ s/([a-z])([A-Z])/\1AAA \2/g;

to see of I was simply missing its application, but still no results. Since similar subsitution routines work fine, i.e. adding spaces where periods already exist; or newlines after colons; etc., I am wondering if data quality is the issue, and not the code (as I am applying it).

The converted Excel file (tab-delimited) contains numerous "junk" characters -- some visible, some not -- so I will re-save it in MS-DOS format and then try this again.
 
I'm going to take a guess that there are Carriage Returns and/or your bullet character in there, considering that the original was a bulleted list.
 
The code definitely works, just not on that particular field. (It does a terrific job on a URL field but, of course, I don't want to use it there.)

I have an additional chance to apply it separately to the target field and, if that does not work, the data is probably full of invisible carriage returns. That happens frequently, and I will need to re-save the data file as a pure DOS text file to eliminate them.

Thanks for your help.
 
Once the file is saved as text, those characters are not invisible, and can be included in the regex.

Upload the file (to something like box.net) and enter the URL as step 3 (attachment) of the reply section.
 
Actually, without even re-saving as DOS text, the code seems to be working just fine on the target field -- after adding all fields to an array, and applying the substitution to just the target, i.e:

Code:
$string=$data[6]; [b]## data[6] is the target Field in the array.[/b]  
$string =~ s/([a-z])([A-Z])/\1\. \2/g;
$data[6]=$string;

So far, so good...
 
Personally, I'd do it with a lookahead/lookbehind rather than bothering with capturing groups that you don't really need:
Code:
$string =~ s/(?<=[a-z])(?=[A-Z])/. /g;
Incidentally, there's no need to escape the dot in the replacement string with a backslash. It's a common mistake.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top