Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

deleting a pattern in all files in a directory

Status
Not open for further replies.

amjadcsu

Programmer
Feb 7, 2006
7
US
hi folks
I have a directory called xml_logs which records real time conversation on IRC servers.
This directory consists of xml files
Each file has a format <!CDATA[chat data]]> in it
I need a script to remove this pattern from each file in directory and
the result should be
<chatdata>

i have tried to use sed but sed works on list on files how to make it work on directory in a bash script. any help
thanks
 
well i would like to use both .
whichever is easy
 
i have tried to use sed but sed works on list on files how to make it work on directory in a bash script. any help
Any one-liner I come up with in perl is liable to have the same limitation. *nix shells expand wildcards into lists, so if you want it to work on a directory, instead of passing /path go with /path/* and you get a list of all files in /path

- Andrew
Text::Highlight - A language-neutral syntax highlighting module in Perl
also on SourceForge including demo
 
You didn't say whether everything between <!CDATA[ and ]]> is on a single line, but, if so, something like the below should work. If not you would have to handle matching across multiple lines. I am sure there is a cleaner way, but it should work substituting ./xml_logs with whatever the correct path is to the logs directory and /tmp with an appropriate temp directory.

Code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Copy;
opendir TEST, "./xml_logs" || die "opendir failed: $!\n";
my @files = readdir(TEST);
closedir TEST;
for my $file ( @files ) {
    next unless ( -f "./xml_logs/$file" );
    open(OLD, "< ./xml_logs/$file") || die "failed to open $file because: $!\n";
    open OUTFILE, "> /tmp/workfile" || die "failed to open /tmp/workfile because: $!\n";
    while (<OLD>) {
	s/!CDATA\[(chat data)\]\]/$1/ig; #
	print OUTFILE;
    }
    close OLD;
    close OUTFILE;
    copy("/tmp/workfile", "./xml_logs/$file") || die "Copy failed: $!\n";
}

Derek
 
Dereks code just needs a small tweak:

s/!CDATA\[(chat data)\]\]/<$1>/ig; #

and if you really want "chatdata" instead of "chat data" you will have to assign $1 to a temp variable and modify the variable to remove the space.
 
or you could:

s/!CDATA\[(chat data)\]\]/<chatdata>/ig; #

or if you wanted the space between the words:

s/!CDATA\[(chat data)\]\]/<chat data>/ig; #

 
ok
the problem is that temp file is created but it is blank.
As such nothing changes in original xml files
 
this is very similar but maybe give it a try:

Code:
#!/usr/bin/perl
use strict;
use warnings;
my $start_dir = 'path/to/xml_logs';
chdir($start_dir) or die "Can't chdir $start_dir: $!";
my @files = <*.xml>;
my @errors = ();
foreach my $file ( @files ) {
    open(OLD, "<$file") or push @errors, "failed to open $file: $!";
    open(TEMP, ">temp.txt") or push @errors, "failed to create temp file: $!";
    while (my $line = <OLD>) {
       $line =~ s/!CDATA\[(chat data)\]\]/<chatdata>/ig;
       print TEMP $line;
    }
    close OLD;
    close TEMP;
    rename('temp.txt', $file) or push @errors, "Rename failed: $!";
}
if (@errors) {
   print "$_\n" for @errors;
}
else {
   print "No errors reported";
}
 
Yeah, I would probably go with ircf's sugggestion.

This might work for you:
Code:
perl -pi.bak -e 's/<!CDATA[chat data]]>/<chatdata>/gi' /path/to/files/*.xml
Or, if you don't want the backup files, run it similar to:
Code:
perl -pi -e ...
And, of course, run that on copies of the files first - make sure it does what you want.
 
Yeah, same works for sed if you already had a file-by-file sed command written.

- Andrew
Text::Highlight - A language-neutral syntax highlighting module in Perl
also on SourceForge including demo
 
That's true. Although, it depends on the version of sed. The crappy one installed on all of our Solaris 9 boxes doesn't have in-place editing! It's crap I tell you.
 
What do I know? I've never used sed. I hear rumors it does the same kind of thing as CLI perl, but people that voluntarily skip out on perl are crazy, I don't trust them. [noevil]

- Andrew
Text::Highlight - A language-neutral syntax highlighting module in Perl
also on SourceForge including demo
 
ok
now instead of chat data if i have something dyanmically created by chat bots

It could be any damn string and not chat data.
Also the format is

<![CDATA[#IRC]]>

where IRC can be replace by any string
I want the output to be

<#IRC>

 
sticking with rharsh's one liner, try this:

Code:
perl -pi.bak -e 's/<![CDATA[([^\]]*)]]>/<$1>/gi' /path/to/files/*.xml

or you may have to escape the square brackets:

Code:
perl -pi.bak -e 's/<!\[CDATA\[([^\]]*)\]\]>/<$1>/gi' /path/to/files/*.xml

 
ok
It does not do anything.
the perl liner does not change anything in original file
 
well, You can retry this code:

Code:
#!/usr/bin/perl
use strict;
use warnings;
my $start_dir = 'path/to/xml_logs';
chdir($start_dir) or die "Can't chdir $start_dir: $!";
my @files = <*.xml>;
my @errors = ();
foreach my $file ( @files ) {
    open(OLD, "<$file") or push @errors, "failed to open $file: $!";
    open(TEMP, ">temp.txt") or push @errors, "failed to create temp file: $!";
    while (my $line = <OLD>) {
       $line =~ s/<!\[CDATA\[([^\]]*)\]\]>/<$1>/ig;
       print TEMP $line;
    }
    close OLD;
    close TEMP;
    rename('temp.txt', $file) or push @errors, "Rename failed: $!";
}
if (@errors) {
   print "$_\n" for @errors;
}
else {
   print "No errors reported";
}

make sure the path to your xml_logs directory is correct:

my $start_dir = 'path/to/xml_logs';
 
Ok thanks
Now i want to do something more

like
if i have

<![CDATA[#IRC]]>

i want only

IRC
thanks
 
i tried the script
$line =~ s/<!\[CDATA\[([^\]]*)\]\]>/<$1>/ig;
not the perl one liner
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top