Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Wanet Telecoms Ltd on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problem with limiting space for a text string

Status
Not open for further replies.

alan6895

Programmer
Aug 3, 2007
48
US
Hi all.
I'm new to Perl, and by new I mean I started looking at it yesterday. I inherited a Perl project from someone who retired from my employer and have been tasked with a few changes. They've all gone well...except for one. Here's the problem: I've got a chunk of text stored in a hash. The length of this text varies depending on the file we parse. Somtimes it's two chracters; other times, it's 100 or more. We output our results to a log file under column headings. Problem is, I've got the script spacing things using tabs, which work somewhat. Depending on the length of the string from the hash, the spacing gets all messed up and destroys our log file. Here's the code as it is now:

print LOG $key."\t\t\t\t\t".$gOccurrenceHash{$key}."\t".$gOccurrenceHash2{$key}."\n";

The headings are printed with this code:

print LOG "Object Name \t\t\t\t\t\t\t\t\tRef Count \t Scope \t Environment\n\n";

Some logs end up fine. Others (most) end up like this:

## OCCURRENCE Object Count
############################
Object Name Ref Count Scope Environment

index_seiseki_sougou.pdf_38839.9246990741_T_6.98255586e-002_0_0_6.98255586e-002 690 Global toppan-cache
index_seiseki_getsuji.pdf_38839.9244907407_T_6.98255586e-002_0_0_6.98255586e-002 644 Global toppan-cache
index_seiseki_nyudan.pdf_38839.9251851852_T_6.98255586e-002_0_0_6.98255586e-002 414 Global toppan-cache
index_seiseki_joho.pdf_38839.9249537037_T_6.98255586e-002_0_0_6.98255586e-002 345 Global toppan-cache
D000000000029.pdf_38996.2427314815_CB_25.511811_6.23622047e-002_646.299213_858.96 299 Global toppan-cache
D000000000029.pdf_38996.2427314815_CB_0_6.23622047e-002_620.787402_858.96 230 Global toppan-cache
D000000000001.pdf_38996.2375694444 115 Global toppan-cache
D000000000031.pdf_38996.2428703704 115 Global toppan-cache
D000000000032.pdf_38996.2380787037 115 Global toppan-cache
D100000000001.pdf_38993.9662962963 115 Global toppan-cache
D000000000030.pdf_38996.2428703704_CB_0_6.23622047e-002_620.787402_858.96 94 Global toppan-cache
D000000000031.pdf_38996.2428703704_CB_0_6.23622047e-002_620.787402_858.96 67 Global toppan-cache
D000000000030.pdf_38996.2428703704_CB_25.511811_6.23622047e-002_646.299213_858.96 46 Global toppan-cache
D000000000031.pdf_38996.2428703704_CB_25.511811_6.23622047e-002_646.299213_858.96 46 Global toppan-cache

Hopefully this will all be readable once it's posted. If not, please let me know and if you think you can help, I can send you some text files.

In case you're wondering, the tool I'm trying to fix parses PPML, and XML-based print job format for HP Indigo Presses.

Any help you can give me is very much appreciated. I have a fall back plan, but it's going to be messy and time-consuming to implement. Thanks!
 
I got to thinking the Perl code might be lacking in some important stuff. So, here's the whole part that prints the hashes:

if (%gOccurrenceHash) {
foreach my $key (sort hashValueDescendingNum (keys(%gOccurrenceHash))) {
# The 11 \t's below need to be replaced with some sort of fixed, useable character space
print LOG $key."\t\t\t\t\t".$gOccurrenceHash{$key}."\t".$gOccurrenceHash2{$key}."\n";
}
 
What is the purpose of this spacing? How do long values in the hash cause the spacing to "get all messed up"? How exactly does it "destroy the log file"?

What exactly you are trying to accomplish is not clear. The ultimate purpose for these files will determine the constraints the formatting is trying to support.

What is this file being read by? Any applications? Or is the goal simply human readability?

Now, not knowing any of these facts, I'm shooting blind. However, I will say that using tabs for spacing is unreliable unless it is simply single tabs being used as a delimiter. If all you're simply trying for human readability, like I suspect, then I would suggest using sprintf:

Code:
[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$file[/blue] = [red]'[/red][purple]foo.dat[/purple][red]'[/red][red];[/red]

[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url][red]([/red]OUT, [red]"[/red][purple]>[blue]$file[/blue][/purple][red]"[/red][red])[/red] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Can't open [blue]$file[/blue]: [blue]$![/blue][/purple][red]"[/red][red];[/red]

[olive][b]while[/b][/olive][red]([/red]<DATA>[red])[/red] [red]{[/red]
	[url=http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/url][red];[/red]
	[black][b]my[/b][/black] [blue]@fakedata[/blue] = [url=http://perldoc.perl.org/functions/split.html][black][b]split[/b][/black][/url][red];[/red]

	[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] OUT [url=http://perldoc.perl.org/functions/sprintf.html][black][b]sprintf[/b][/black][/url] [red]"[/red][purple]%-85s %-3s %-6s %s\n[/purple][red]"[/red], [blue]@fakedata[/blue][red];[/red]
[red]}[/red]

[url=http://perldoc.perl.org/functions/close.html][black][b]close[/b][/black][/url][red]([/red]OUT[red])[/red][red];[/red]

[teal]__DATA__[/teal]
[teal]index_seiseki_sougou.pdf_38839.9246990741_T_6.98255586e-002_0_0_6.98255586e-002 690 Global toppan-cache[/teal]
[teal]index_seiseki_getsuji.pdf_38839.9244907407_T_6.98255586e-002_0_0_6.98255586e-002 644 Global toppan-cache[/teal]
[teal]index_seiseki_nyudan.pdf_38839.9251851852_T_6.98255586e-002_0_0_6.98255586e-002 414 Global toppan-cache[/teal]
[teal]index_seiseki_joho.pdf_38839.9249537037_T_6.98255586e-002_0_0_6.98255586e-002 345 Global toppan-cache[/teal]
[teal]D000000000029.pdf_38996.2427314815_CB_25.511811_6.23622047e-002_646.299213_858.96 299 Global toppan-cache[/teal]
[teal]D000000000029.pdf_38996.2427314815_CB_0_6.23622047e-002_620.787402_858.96 230 Global toppan-cache[/teal]
[teal]D000000000001.pdf_38996.2375694444 115 Global toppan-cache[/teal]
[teal]D000000000031.pdf_38996.2428703704 115 Global toppan-cache[/teal]
[teal]D000000000032.pdf_38996.2380787037 115 Global toppan-cache[/teal]
[teal]D100000000001.pdf_38993.9662962963 115 Global toppan-cache[/teal]
[teal]D000000000030.pdf_38996.2428703704_CB_0_6.23622047e-002_620.787402_858.96 94 Global toppan-cache[/teal]
[teal]D000000000031.pdf_38996.2428703704_CB_0_6.23622047e-002_620.787402_858.96 67 Global toppan-cache[/teal]
[teal]D000000000030.pdf_38996.2428703704_CB_25.511811_6.23622047e-002_646.299213_858.96 46 Global toppan-cache[/teal]
[teal]D000000000031.pdf_38996.2428703704_CB_25.511811_6.23622047e-002_646.299213_858.96 46 Global toppan-cache[/teal]

- Miller
 
Hi Miller.
Thanks for your reply. I'll try to answer your questions as best I can. The purpose of the spacing is for human readability. The output log needs to be easy to read so that we can diagnose problems in the PPML quickly. The hash values start disrupting the columns at an unknown length. The log file is "dstroyed" because the hash values disrupt the columns, as shown in my example text. The column headings at the top (Object Name, Ref Count, Scope, and Environment) can be spaced a fixed width apart using tabs. They didn't copy into this forum very well, so they don't look quite like they do in the actual log.

Basically, What I'm trying to do is allow the Object Name to be, say, 100 characters long. This 100 characters will either be whitespace or filled with up to 100 characters of text. If there's less than 100 characters in the object name, then the remaing characters will remain whitespace, thereby preserving my alignments. Does that make any sense? I'm sure there's an answer somewhere on the internet, but I can't think of the right way to phrase it so that Google can find it.

In the end (in the log file), the object names need to align left. Then, the ref counts need to all line up with the Ref Count heading, the scopes need to line up with the Scope heading, and the environments need to line up with the Environment heading. To do this, word wrap obviously has to be turned off, which we've done. So, I want it to end up looking like this file (it works best on a high-res screen):
Does this help at all? My programming experience is in ASP .NET, so I'm not sure how to manipulate Perl very well...
 
I'm sure there's an answer somewhere on the internet,

Sounds like you want fixed length records. Miller has already shown you how you can format your records using sprintf.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
alan6895 said:
I'll try to answer your questions as best I can.

Yes, that does help clarify your goal and requirements.

Basically, my answer to you is that the code that I provided above demonstrates what you need. The sprintf function lets you add formatting to variables, in this case the right padding of strings.

To spell it out more explicitly, something close to this is what you're aiming for:

Code:
[olive][b]foreach[/b][/olive] [url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$key[/blue] [red]([/red][url=http://perldoc.perl.org/functions/sort.html][black][b]sort[/b][/black][/url] [maroon]hashValueDescendingNum[/maroon] [red]([/red][url=http://perldoc.perl.org/functions/keys.html][black][b]keys[/b][/black][/url][red]([/red][blue]%gOccurrenceHash[/blue][red])[/red][red])[/red][red])[/red] [red]{[/red]
	[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] LOG [url=http://perldoc.perl.org/functions/sprintf.html][black][b]sprintf[/b][/black][/url] [red]"[/red][purple]%-100s %-10s %-10s %s\n[/purple][red]"[/red], [red]([/red][blue]$key[/blue], [blue]$gOccurrenceHash[/blue][red]{[/red][blue]$key[/blue][red]}[/red], [blue]$gOccurrenceHash2[/blue][red]{[/red][blue]$key[/blue][red]}[/red][red])[/red][red];[/red]
[red]}[/red]

The above code is not perfect though as there were only 3 variables included in your print statement when you there 4 in the output. So at least one thing isn't right.

- Miller
 
Kirsle said:

The format statement is one of the original features of Perl. However it is beyond depreciated, and should never even be mentioned except as a curse word in trusted company.

If you truly want the functionality that it attempts to provide then you should use the "new" Perl6::Form module. This implements all the features of format, but at runtime (instead of compile time) and without the need for global variables.


In most instances, I believe that sprintf is more than sufficient for most people's needs though. And would work just fine in this case.

- Miller
 
Thanks for the advice all.

Miller, thanks for showing me exactly what I need to try. I've run accross that sprintf function before, but couldn't figure out how to use it. I'll give that a try on Monday. A followup with sprintf, though: does it have to be used with variables or will it work with a static string, like to print the headers? If it works with plain strings, then I can save a little tiny bit of code assigning the headers to variables. Also, to your comment about there being three printed variables but four showing up. That's because the third variable is currently a concatenated string that's assembled earlier in the code. I'll probably change it to a hash just to make sure it all comes out right all the time. Thanks again!
 
The quick answer is no, sprintf doesn't have to be used with variables. static entities are fine too.

And yes, sprintf will work great for your headers. Using the same formatting string in fact.

- M
 
Perfect. I will implement that tomorrow so we can get this tool out in our next SDK. Thanks a bunch, Miller!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top