Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

I need advice...

Status
Not open for further replies.

westcoaststyle

Programmer
Oct 15, 2001
81
US
Good afternoon all,

I have a project that just hit my desk that will require me to pull data out of my database and format it based on specific formatting rules with the intention of importing it into a client program using Perl/CGI. An example of this formatting is as follows:

Code:
EH                                   20030618123456789
TH T100099-002123456789
TPI 1.0001                              N

The formatting rules require that each piece of data be placed in a specific column position to be properly read by the importing software. The formatting rules for these three lines are:

Code:
--LINE1--
Var1: Position=1, Length=3
Var2: Position=4, Length=6
Var3: Position=10, Length=25
Var4: Position=35, Length=11
Var5: Position=46, Length=9

--LINE2--
Var1: Position=1, Length=3
Var2: Position=4, Length=11
Var3: Position=15, Length=9

--LINE3--
Var1: Position=1, Length=3
Var2: Position=4, Length=5
Var3: Position=9, Length=2
Var4: Position=11, Length=30
Var5: Position=41, Length=1

There are approximately 30 pages to the list of variables so it will be fairly difficult to automate a great deal of this. My major concern is what the best and most efficient way of pulling the data from the DB and adding the results to the proper Line/Position will be. The data may not always be the same length as the formatting requires and there are also instances where there is no data at all for the fields, but all lengths must be filled either with actual data or blank spaces. I'm getting flashbacks of COBOL with this... Any thoughts?

Thanks for the help.
 
how does the formatting you posted relate to the lines you posted? I don't see the realationship...
 
Each row holds multiple variables. Each of these variables is placed in a specific column position and must hold a specific number of columns (length).

For example, LINE1 has 5 variables. While not all of them actually hold data, their places within the row are filled (with blanks/spaces in this case).

It breaks down like this using "|" to show the spaces:
Var1 = EH|
Var2 = ||||||
Var3 = |||||||||||||||||||||||||
Var4 = |||20030618
Var5 = 123456789

If you're looking at it all cock-eyed, just remember that I'm the one that has to integrate it. :p
 
I must be having a senior moment, I still do not see the relationship very well. How would you know for example, that line two only has three variables and not five? Or how could that be determined by examining the line with perl code? Is there a delimiter in the lines?
 
I don't think this sounds too terribly hard. The first thing that I would do is create an array of arrays that contain your length definitions for each field/line.
Code:
my @lengths;
push @lengths, [3, 6, 25, 11, 9];   # Line 1
push @lengths, [3, 11, 9];          # Line 2
push @lengths, [3, 5, 2, 30, 1];    # Line 3
Once you have those, then you need some data to use them with. Since I don't have a real good idea what your data looks like, I used this for dummy data:
Code:
my @record = ('one', 'two', '', 'four', 'five', '', '', 'eight',
              'nine', 'ten', '', 'twelve', 'thirteen', 'fourteen',
              'fifteen', 'sixteen', 'seventeen', 'eighteen');
I reread the OP and I'm still not sure if the data in each field can be longer than then given output-field widths. So, incase there can be more data, you'll need to truncate all the values to the given lengths.
Code:
sub trunc_array {
    my ($text_ref, $length_ref) = @_;
    die unless $#$text_ref == $#$length_ref;
    my @cleaned;
    foreach my $i (0..$#$text_ref) {
        push @cleaned, substr($text_ref->[$i] || "", 0, $length_ref->[$i]);
    }
    return @cleaned;
}
Now that each string should be the correct length, you'll need a function to format and print the data. Using the printf function is going to be your best bet for this. You can build a pattern to use for printf based on the lengths of each field that needs to be printed on that line. I came up with:
Code:
sub print_record {
    my ($data_ref, $length_ref) = @_;
    die unless $#$data_ref == $#$length_ref;
    my $pattern;
    map {$pattern .= "\%-${_}s" } @$length_ref;
    $pattern .= "\n";
    printf $pattern, @{$data_ref};
    
}
If you notice in the map statement, there's a '-' in the pattern - that left-aligns all the text (and pads toward the right.) If that's not what you want, remove the '-' and it will right align everything.

You'll need to keep track of which field you're on in the record (that's the $field variable below.) So, now, to make it all work:
Code:
my $field = 0; #Tracks the first field to be output on the line
foreach my $i (0..$#lengths) {
    my @temp = trunc_array([@record[ $field .. $field + $#{$lengths[$i]} ] ],
                           [ @{$lengths[$i]} ]);
    print_record(\@temp, \@{$lengths[$i]});
    $field = $#{$lengths[$i]} + 1;
}
Does that help get you started?
 
Hmm... I made a mistake in the update line for $field - it should read:
Code:
$field [b][blue]+[/blue][/b]= $#{$lengths[$i]} + 1;
 
Kevin:

I will already have all of the data, I just need to format it based on the rules I stated in the first post. These rules are preset so consider them the data dictionary that must be followed to successfully format each line to properly interact with the client software.

rhash:

Thanks! I'm still very green at perl so I'll have to try and decipher the code so I know what all it is doing. The real headache to this task is there are 27 pages of these formatting rules totaling I don't know how many lines. They're going to have to be hardcoded into an array or even a separate table in the DB for this to work... It's going to be a long week...

Anyhow, thanks again!!
 
The real headache to this task is there are 27 pages of these formatting rules totaling I don't know how many lines.
That sounds to me like something that can be automated using a script too.
 
It probably could if the rules had any kind of pattern to them, but I'll have to program these rules into this process which will have to be done by hand based on how 'random' they are.

You can view the full list in this document:
Here's a full sample data file that my data must mimic formatting-wize:
Code:
EH                                   20030618123456789
TH T100099-002123456789
TPI 1.0001                              N
0001  3.20 W
00ANN
01A01                                                                                                              TC001                 95000.00  8.00036005                                                                                                                                                                
02A19991 East Eubank Street                          Louisville                         KY40209    1  02Single Family                                                                   2001
PAI19991      East Eubank Street                                 
02B  16                                                                                1                                                            1        
02EF1        2500.00                                                                                
02E04        2500.00                                                                                
03ABW500606666Suzi                                                                  Builder                                888888888836 19U  N         19670729SuzieBuilder@fanniemae.com                                                      
03C500606666ZG5404 Pawnee Trail                                 Louisville                         KY40207    O2 0                                                   
03C500606666BH123 Main St                                       Louisville                         KY40207    O2 0 USA                                               
03C500606666F4100 Illinois St                                   Arlington                          VA22056    O2 0                                                   
04A500606666MCI Worldcom                       1111 Central Ave                   Louisville                         KY40210    N7 3 7 Manager/IS                         
05H500606666106          50.00
05H500606666123          25.00
05H500606666126         850.00
05H500606666122         150.00
05H500606666101          30.00
05H500606666114         120.00
05H500606666102          60.00
05H500606666201          70.00
05H500606666214         150.00
05H500606666202          50.00
05H500606666206          60.00
05H500606666223          30.00
05H500606666226         697.08
05I50060666620        3600.00
06C50060666603                                                                                                                                                           2700.00                                                                                          
06C500606666SG                                                                                                                                                           3000.00                                                                                          
06C500606666F2                                                                                                                                                          15000.00                                                                                          
06C500606666F7 Easy Finance & Loan                2345 Elm Street                    Louisville                         KY40211                                         25000.00                                                                                          
06C50060666601 ABC Mortgage                       123 Main Street                    Louisville                         KY40204                                        100000.00                                                                                          
06D500606666Dodge Aries                   1986         500.00
06G5006066665404 Pawnee Trail                  Lousiville                         KY40207    R14      100000.00       68100.00        1150.00        1000.00         285.00        -422.50YN1                
06L500606666I BANK ONE                                                                                                            145000                                 217.0025         5342.00N   NNN
06L500606666I KENTUCKY TELCO                                                                                                      64483                                   15.0017          254.71N   NNN
06L500606666R ATT UNIVERSAL CARD                                                                                                  478355                                  67.0050         3350.00N   NNN
06L500606666R DISCOVER CARD SERVICES                                                                                              601100                                  36.0050         1800.00N   NNN
06L500606666R FIRST USA BANK                                                                                                      441716                                  41.0050         2050.00N   NNN
06L500606666R HESS/ CRS                                                                                                           621865                                  15.0050          750.00N   NNN
06L500606666R HOBBS BANK                                                                                                          0987665                                 80.0046         3673.00N   NNN
06L500606666R GE CARD CO                                                                                                          540756                                  63.0050         3114.00N   NNN
06L500606666R MBNA AMERICA BANK                                                                                                   431300                                  50.004           200.00N   NNN
06L500606666I KENTUCKY NATIONAL                                                                                                   1234567                                817.00152      123773.00N   YNN
06L500606666M LOUISVILLE SAVINGS AND LOAN                                                                                         06666-001                              850.0077        65100.00N1  NNN
06L500606666M LOUISVILLE SAVINGS AND LOAN                                                                                         06666-002                              150.0020         3000.00N1  NNN
07A      180000.00                                                                    5000.00                                                                           
08A500606666NNNNNNNNN01YN   
10A500606666N2                              F
10BIJohn Smith                                                  20010808                                                                                                                                                                 
10R5006066661 
10R5006066662 
10R5006066663 
10R5006066664 
10R5006066665 
00011 3.20 
LNC1O01                 Y                                                           99.99NN
PID                                                  
PCH84  01    
PAJ                                                          
RAJ                            
00070 3.20 
99B F1      195000.00  0.00001104Johnny The Appraiser                                        Johnny's Appraisal Company         555555555555555VA
ADSTEST                               TEST1                                                                      
TT 123456789
ET 123456789
 
Quite honestly, Fannie Mae needs to get with the times and realize that XML would be a much better way to import data over this position/length BS... imho. :)
 
I think in order for anyone to help you, you need to explain the lines in the data file, not the pattern rules. Once a pattern in a line is determnined, the correct formatting rule can be applied somehow, but what is the pattern (or patterns) in the lines of the data file you posted? ie: Is each line tab delimited and have the same number of tabs?
 
Kevin, I've posted exactly what I'm doing and exactly what the formatting rules are. I don't know what else I can explain to help you understand. Everything that I have available to me has been posted here. It took me a few minutes to grasp what it is that must be done with this project as well.

The pattern is defined by the .pdf I posted and also the position/length definitions I expressed in the first post of this thread. I am not in control of the pattern, I must conform to the pattern already defined for me. Maybe that will hit home? :)
 
OK, maybe someone else will take the time to read the pdf file and figure it out.
 
good luck, I hope someone gives you a hand. I would, if I could figure it out, but it seems to be over my head. [dazed]
 
Thanks Kevin. Like I said, it took me a while too and now that I understand it, it's obviously difficult to convey to others without them going through the steps that I had to. That's why I didn't post everything to begin with. I just need to know how to make perl/cgi place data in a specific column position of a text file/string. That or populate variables with the correct length of data and I will concatenate each variable in the correct order, on the correct line. The data is really irrelevant, I just need to know how to make the pattern and everything else will fall into place. :)
 
Hmm... I thought that was pretty much what the code I posted did. What's missing?
 
Nothing, it works great so far. I just started messing with it. Very well done, rharsh. Thank you!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top