----------------------------
[small]ignore this section:
code
perl
print
processing[/small]
----------------------------
Problem :
You have some sort of text file with many duplicate lines and you want to remove all the duplicates but also keep the original order of the lines.
Solution :
Use perls in-place editor and a hash.
Code:
[ol]
[li][gray]#!/usr/bin/perl[/gray][/li]
[li][/li]
[li][link http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/link] [green]strict[/green][red];[/red][/li]
[li][black][b]use[/b][/black] [green]warnings[/green][red];[/red][/li]
[li][/li]
[li][link http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/link] [blue]$file[/blue] = [red]'[/red][purple]/path/to/file.txt[/purple][red]'[/red][red];[/red][/li]
[li][black][b]my[/b][/black] [blue]%seen[/blue] = [red]([/red][red])[/red][red];[/red][/li]
[li][red]{[/red][/li]
[li] [link http://perldoc.perl.org/functions/local.html][black][b]local[/b][/black][/link] [blue]@ARGV[/blue] = [red]([/red][blue]$file[/blue][red])[/red][red];[/red][/li]
[li] [black][b]local[/b][/black] [blue]$^I[/blue] = [red]'[/red][purple].bac[/purple][red]'[/red][red];[/red][/li]
[li] [olive][b]while[/b][/olive][red]([/red]<>[red])[/red][red]{[/red][/li]
[li] [blue]$seen[/blue][red]{[/red][blue]$_[/blue][red]}[/red]++[red];[/red][/li]
[li] [olive][b]next[/b][/olive] [olive][b]if[/b][/olive] [blue]$seen[/blue][red]{[/red][blue]$_[/blue][red]}[/red] > [fuchsia]1[/fuchsia][red];[/red][/li]
[li] [link http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/link][red];[/red][/li]
[li] [red]}[/red][/li]
[li][red]}[/red][/li]
[li][black][b]print[/b][/black] [red]"[/red][purple]finished processing file.[/purple][red]"[/red][red];[/red][/li]
[/ol]
[tt]------------------------------------------------------------
Pragmas (perl 5.8.8) used :
[ul]
[li][link http://perldoc.perl.org/strict.html]strict[/link] - Perl pragma to restrict unsafe constructs[/li]
[li][link http://perldoc.perl.org/warnings.html]warnings[/link] - Perl pragma to control optional warnings[/li]
[/ul]
[/tt]
Discussion :
By duplicate lines, I mean just that, exactly the same, including white space and other characters. If extra white spaces were not to be considered you could collapse them into one white space after line number 11 and before line number 12.
but if you wanted to keep the original line with all the white spaces as they were you would have to make a temporary copy of it to print back into the file.
Code without markup :
Code:
#!/usr/bin/perl
use strict;
use warnings;
my $file = '/path/to/file.txt';
my %seen = ();
{
local @ARGV = ($file);
local $^I = '.bac';
while(<>){
$seen{$_}++;
next if $seen{$_} > 1;
print;
}
}
print "finished processing file.";