How would I write something in Unix or Perl to change sentences or paragraphs in many web page documents. All the documents have some
similiar sentences and paragraphs. On occasion I need to change or
delete a specific sentence or paragraph. Any suggestions?
you can use find and perl. find will search the files
you want. perl will change their contents.
for instance, suppose you want to change all
paragraphs "<p>this is a paragraph" to
"<p>this is another paragraph" on all your
html files under your current directory:
$ find . -name \*.html |
xargs perl -i -p -e 'BEGIN{undef $/} s/<p>this is a paragraph/<p>this is another paragraph/sg'
now it's a question of tunning the regular expression
inside s///. most of the times you will want to ignore
spaces, then you do:
s/<p>\s*this is ...\s*/<p>\n new ... \n/sg
also, you might want to change something inside your
tags (some attribute, for example):
s/(<img[^>]*src="x.gif(".*?>)/$1y.gif$2/sg
(the $2 part wasn't really necessary, but makes it
more understandable)
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.