×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Perl on the server, does it fit the bill for this job?
2

Perl on the server, does it fit the bill for this job?

Perl on the server, does it fit the bill for this job?

(OP)
I have a SQL server machine that has Perl on it. No one currently with the company knows how to use Perl. Unfortunately for me, I am supposed to come up with the answer to the following question, oh, like yesterday!

I apologize in advance for my ignorance, but what little I do know of Perl is that it seems like it would solve the current problem readily.

We have a very large fixed length field text file which is used as the source for a reporting database. It has only one 'unique' field (message ID). The problem is that in freakish system situations bogus records get written to the log which result in duplicate records or records with the same message ID.

How would I go about stripping out all occurences of records containing the non unique references to the message ID, EXCEPT the last record (this is the one that actually got delivered)?

Any guidance is greatly appreciated. I will be reading my recently aquired O'reilly Perl books on the train....

RE: Perl on the server, does it fit the bill for this job?

Perl is great at text manipulation and searching, and was known in it's early days simply for that task alone.  Getting only the unique elements out of something is a chore that i wrote a FAQ on here at tek-tips, and i'm sure you could change it to suit your task.  You could probably even use sed or awk (maybe both) to do this chore too, but i suggest you get a programmer to do it, as the details will be necessarily meticulus.

"If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito."

RE: Perl on the server, does it fit the bill for this job?

Agreeing with stillflame...... Perl would do the job well.

How large is 'very large'?  Large is a relative term..... relative to the amount of RAM on the box.  Perl will handle a lot of text very well.  Perl has very strong regex and memory structure capabilities.  I have written applicatons that convert 20 and 30 meg XML files to HTML very nicely.  However, if you are repeatedly producing reports from this text database and it is truely very large ( hundreds of megs), and if you expect the data set to grow, then........ it seems like the real fix would be to put your SQL server software to work using your message ID as the primary key.  You can setup the SQL server as an ODBC source and use the Perl DBI/DBD::ODBC modules to connect for your reporting.

good luck..


 
 
 keep the rudder amid ship and beware the odd typo

RE: Perl on the server, does it fit the bill for this job?

(OP)
By "very large" I am talking about 250 meg a month added to the text file source for the database. The database table is dropped every night and refreshed with the updated data, which actually goes pretty quickly.

The problem is the primary key issue since the source text file is "dirty" (about ~100 bogus records per month out of ~1000000). The only way I can see using the message ID is to clean out the bad records prior to importing. Since Perl is on the system, I thought it would be a possible tool for this that would not be overly difficult to use.

I get the impression by the posts that I should still lobby for some programming help as this is not a trivial bit of Perl scripting. Correct?

Thanks for your help

RE: Perl on the server, does it fit the bill for this job?

You could use Perl/DBI/DBD to do the entire trick.  Depending on a number of issues, you could either use Perl to rewrite the file and then do the insert , OR, as you read the file, do an update or insert as needed.  OR, TMTOWTDI...OR....OR....

If you are going to use Perl to connect to the database, get 'Programming the Perl DBI', by Alligator Descarte and Tim Bunce, from O'Reilly.  It would be $35 (US) spent on 333 very valuable pages.

The following lays out a flow(leaves out a lot) for doing an insert or an update if the message ID aready exists.  This is a little verbose to allow for some clarity.

#!/usr/local/bin/perl

use DBI;
# do the database connect stuff and construct the sql statements

# open the text file.
open(IPF,">theLargeFile") die "Failed to open input file , $!\n";

# while reading the file....
while ($line = <IPF>)
{
# split the line on a delimiter, maybe a pipe |
my @fields= split(/\|/,$line);

# if message ID is the first field.
my $id = $fields[0];
if ($messageIDs{$id})
    {
    # id already exists in DB.
    # do an update instead of an insert.
    }
else
    {
    # stuff the id into an associative array to record the fact that it exists
    $messageID{$id} = 1;
    # insert to the db for a new message ID
    }
}

'hope this helps...
Good Luck.


 
 
 keep the rudder amid ship and beware the odd typo

RE: Perl on the server, does it fit the bill for this job?

(OP)
Thanks much.
I misled you though when I described the data, the fields are fixed length, but no delimiter. Fields are padded with spaces. Each record is delimited by a newline. The message ID is always 48 characters long and sits at the same position in the record.

Here is an example of the data with record 2 and 3 having identical message IDs;

XSBH_HUB00052001020105:34:37414D5120584E41495F4E54585F30303300006B003A78F57D
EANCM5034316404080                      EANCM5034316372020
                      ENTERPRISE        LIVE                NO    DG0000001250

XSBH_HUB00052001020105:34:37414D5120584E41495F4E54585F30303300006B003A78F57E
EANCM5034316404080                      EANCM5034316372020
                      ENTERPRISE        LIVE                NO    DG0000001250

XSBH_HUB00052001020105:34:38414D5120584E41495F4E54585F30303300006B003A78F57E
EANCM5034316404080                      EANCM5034316372020
                      ENTERPRISE        LIVE                NF    DG0000001317

RE: Perl on the server, does it fit the bill for this job?

while ($line = <IPF>)
{
# use substr to clip id from line - substr($line,start_where,how_long);
$id = substr($line,28,48); # I may not have counted correctly. maybe 27,49 or other.
if ($messageIDs{$id})
    {
    # id already exists in DB.
    # do an update instead of an insert.
     }
else
    {
    # stuff the id into an associative array to record the fact that it exists
    $messageID{$id} = 1;
   # insert to the db for a new message ID
    }
}


 
 
 keep the rudder amid ship and beware the odd typo

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close