Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to remove duplicate lines in a document? 1

Status
Not open for further replies.
Apr 30, 2003
56
US
I need to write a short perl script that will open a document and read into each lines and compare the lines, if there is a duplicate, remove the duplicate lines. How am I able to achieve this?

$report_file="/apps/oracle/as400/dspfdall.dat";
open (REPORT, $report_file)||die "Can't open report file!\n";
$count = 0;
while ($line =<REPORT>){

$count++}
close (REPORT);

what should I put inside the while loop to compare each lines and find the duplicate and remove the duplicate?
 
Do you want to find duplicates on consecutive lines, or anywhere in the document?

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
This should work for you.
Code:
#!/usr/bin/perl

use strict;

my $in_file  = "/apps/oracle/as400/dspfdall.dat";
my $out_file = "/apps/oracle/as400/dspfdall.new";

my %hash;

open (INFILE, "$in_file") or die $!;
while ( <INFILE> ) {
    s/\s+//;
    $hash{$_} = 1;
}
close(INFILE);

my @data = sort keys %hash;

open (OUTFILE, ">$out_file") or die $!;
for (@data) {
    print OUTFILE $_ . "\n";
}
close(OUTFILE);

M. Brooks
 
Eileen

The key word in your last post is most. If they aren't consecutive, do you still want them removed? If so, MBrooks' neat hash solution will do what you want, but will not preserve the original order of the file (even without the sort).

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Even if they are not consecutive, if they are duplicate, I want it to be removed. I guess the order of the original file doesn't really matter to me.
 
another way to do it:

Code:
my $report_file = "/apps/oracle/as400/dspfdall.dat";
my %seen = ();
{
   local @ARGV = ($report_file);
   local $^I = '.bac'; 
   while(<>){
      $seen{$_}++;
      next if $seen{$_} > 1;
      print;
   }
}

print "finished";

- Kevin, perl coder unexceptional!
 
sorry, I clicked submit instead of preview. [blush]

Another way to do it. Creates a backup file of the original, removes duplicates, retains original order. You can unlink/delete the backup file if not needed. Run some checks to make sure it works OK if you decide to try it.

Code:
my $report_file = "/apps/oracle/as400/dspfdall.dat";
my %seen = ();
{
   local @ARGV = ($report_file);
   local $^I = '.bac'; 
   while(<>){
      $seen{$_}++;
      next if $seen{$_} > 1;
      print;
   }
}
print "finished";



- Kevin, perl coder unexceptional!
 
If you have access to the as400 in the reference, you could send the dspfd *all to an outfile, query it sorting by file name (I guess), break it, and reprint the report. However, I'm a little confused on how you'd have duplicate lines anyway. The files can't be duplicated within the same library unless you're just looking at the name and not the file attribute (PF, LF, etc.).

I know it's off subject, but dealing with the data prior to this point may be a better solution.

Mark
 
Kev

One pass over the file, makes a backup, preserves original ordering. Nice solution - have a star!

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
local @ARGV = ($report_file);
local $^I = '.bac';

That is so neat...

Mike

The options are: fast, cheap and right - pick any two.. [orientalbow] & [anakin]

Want great answers to your Tek-Tips questions? Have a look at faq219-2884
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top