×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

Working withText Files

How do I remove duplicate lines from a file? by KevinADC
Posted: 13 Feb 07 (Edited 15 Feb 07)

----------------------------
ignore this section:
code
perl
print
processing

----------------------------



Problem :

You have some sort of text file with many duplicate lines and you want to remove all the duplicates but also keep the original order of the lines.  

Solution :

Use perls in-place editor and a hash.

CODE

  1. #!/usr/bin/perl
  2. use strict;
  3. use warnings;
  4. my $file = '/path/to/file.txt';
  5. my %seen = ();
  6. {
  7.    local @ARGV = ($file);
  8.    local $^I = '.bac';
  9.    while(<>){
  10.       $seen{$_}++;
  11.       next if $seen{$_} > 1;
  12.       print;
  13.    }
  14. }
  15. print "finished processing file.";
------------------------------------------------------------
Pragmas (perl 5.8.8) used :
  • strict - Perl pragma to restrict unsafe constructs
  • warnings - Perl pragma to control optional warnings


Discussion :

By duplicate lines, I mean just that, exactly the same, including white space and other characters. If extra white spaces were not to be considered you could collapse them into one white space after line number 11 and before line number 12.

CODE

tr/ //s;

but if you wanted to keep the original line with all the white spaces as they were you would have to make a temporary copy of it to print back into the file.

Code without markup :

CODE

#!/usr/bin/perl

use strict;
use warnings;

my $file = '/path/to/file.txt';
my %seen = ();
{
   local @ARGV = ($file);
   local $^I = '.bac';
   while(<>){
      $seen{$_}++;
      next if $seen{$_} > 1;
      print;
   }
}
print "finished processing file.";


Back to Perl FAQ Index
Back to Perl Forum

My Archive

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close