Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parse string into equal lengths 1

Status
Not open for further replies.

CJason

Programmer
Joined
Oct 13, 2004
Messages
223
Location
US
I have a string that I need to parse into ~X-number of equal length strings...maybe put each part into array elements. The catch....the string is a sentence and I need to make sure I parse on a "word" ending.

For example, if I have the following:

string = "This is my string that I want to split."

and I want to split this into ~15-length strings, so I want to get the following:
s1: "This is my"
s2: "string that I"
s3: "want to split."

Any ideas? Thank you in advance!
 
This code is not perfect, but it gives you the ability to dynamically set the number of strings that you want to split into.

Code:
my $string = "This is my string that I want to split.  There are many like it.  But this one is mine.";

my $splitCount = 3; # Set this to whatever you like

my $characterCount = int(length($string) / $splitCount);

my @strings = $string =~ /(?=\w)(.{1,$characterCount}\S*)/g

print join "\n", map {"'$_'"} @strings;

# Outputs:
# 'This is my string that I want'
# 'to split.  There are many like'
# 'it.  But this one is mine.'

The one problem that you might have is that it is not garanteed to return the exact splitCount that you specify yet. It would be possible to come closer to garanteeing this by reducing the number of characters by one in $characterCount to account for the spaces that you will be losing on splits. But even this will not be perfect. Say you had a sentence with only 4 words and you wanted to split into 5 strings. Obviously this regex would not return you 5 strings.

Nevertheless, this will probably give you ideas. Good Luck.
 
Actually, since I want to split on "lengths", not "number", I think your algorithm gets even simpler...as well as removes the problems that you describe that might be encountered. In other words, I don't really care about the NUMBER of strings returned....just that they are all no longer than X.

Yet again, you've put my on the path for greatness!!!
 
this is sort of an opposite approach to Millers suggestion:

Code:
my $string = "This is my string that I want to split.  There are many like it.  But this one is mine.";
my @strings = $string =~ /\s*(.{1,15}(?!\w))\s*/g;
print join "\n", map {"'$_'"} @strings;

The output for Millers twist on the "this is my rifle" chant is:

Code:
'This is my'
'string that I'
'want to split. '
'There are many'
'like it. But'
'this one is'
'mine.'

I am sure there will be problems with these appraoches if the strings have long unbroken sequences of word characters.

- Kevin, perl coder unexceptional! [wiggle]
 
Thanks guys! I think I simplified it even further...mainly because I don't understand the usage of the "?" parts of your code:

my @strings = $string =~ /(.{1,15}\s+)/g;

This works because $string is obtained from user entry...so, it's guaranteed to end with "\n".

Can you guys explain what the "?" is doing in your code? Also, do you think my simplified version will work?
 
quoted from a perl source:

Table 10.8 Five Extension Components

Extension Description

(?# TEXT) This extension lets you add comments to your regular expression. The TEXT value is ignored.

(?:...) This extension lets you add parentheses to your regular expression without causing a pattern memory position to be used.

(?=...) This extension lets you match values without including them in the $& variable.

(?!...) This extension lets you specify what should not follow your pattern. For instance, /blue(?!bird)/ means that "bluebox" and "bluesy" will be matched but not "bluebird".

(?sxi) This extension lets you specify an embedded option in the pattern rather than adding it after the last delimiter. This is useful if you are storing patterns in variables and using variable interpolation to do the matching.

- Kevin, perl coder unexceptional! [wiggle]
 
Yes, that would work.

Your original question included the spacing being removed, so that is why my code was slightly more complicated. Also, since it was fun making it dynamic.

For the ? statements, just read Kevin's post or from the source:


And for even more regex fun, just read the rest of the faqs as well.

 
I think I'm starting to get it. How about if I did this:

my $string = "This is my string that I want to split. There are many like it. But this one is mine.\n";
my @strings = $string =~ /(.{1,15}(?=\s+))/g;
print join "\n", map {"'$_'"} @strings;

will that result in (notice, no spaces at the end):

'This is my'
'string that I'
'want to split.'
'There are many'
'like it. But'
'this one is'
'mine.'

If so, this is EXACTLY what I'm looking for!!!!
 
Sorry, just wanted to bring some closure to this thread. My last posting was wrong. Here is the output that I want:

'This is my'
'string that I'
'want to split.'
'There are many'
'like it. But'
'this one is'
'mine.'

And here is what will do it:

my @strings = $string =~ /\s*(.{1,15})\s+/g;

Thanks all!!

 
make sure you test it out well on real user input. Those pesky users have a way of messing up everything. The internet would be perfect if we could just get rid of the users... [smile]

- Kevin, perl coder unexceptional! [wiggle]
 
Since this thread isn't quite dead yet, here's another possible solution:
Code:
my $string = 'This is my string that I want to split.  There are many like it.  But this one is mine.';

use Text::Wrap;
local $Text::Wrap::columns = '16';  # The width you want +1
my @output = split $Text::Wrap::separator, Text::Wrap::fill('', '', $string);

print "\|$_\|\n" foreach @output;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top