×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

extracting text between two tags in a very long string

extracting text between two tags in a very long string

extracting text between two tags in a very long string

(OP)
I have a file containing about 30,000 characters on 3-4 lines. I need to extract the text between two tags <TAG1> and <TAG2>. I've tried a couple things but nothing yields the results i'm looking for. Below is an example of the text and the last code snippet I tried.

Text Example:

CODE

/i""","customAttributes":{},"alt":null,"href":null},{<TAG1>http://i.mg.com/00/s/MTIwMFgxNjAw/z/~5UAAOxyoA1RThdw/$T2eC16d,!ygE9s7HKL!RBRThdwL2fQ~~60_3.JPG<TAG2>","customAttributes":{},"alt":null,"href":null},{<TAG1>http://i.mg.com/00/s/MTIwMFgxNjAw/z/FLQAAOxy14VRTheJ/$T2eC16Z,!zEE9s3!YlV+BRTheJOOu!~~60_3.JPG<TAG2>","customAttributes":{},"alt":null,"href":null},{<TAG1>http://i.mg.com/00/s/MTIwMFgxNjAw/z/BMoAAMXQySpRTheV/$T2eC16N,!yEE9s5jE,jcBRTheUuoz!~~60_3.JPG<TAG2>"ew vjo.darwin.core.thumbnailgrid.ThumbnailGrid({"instId":"th_js_vv4-38","cmpId":"vv4-38" 

Perl example:

CODE

!/usr/bin/perl
my $input = "<TAG1>"; #Search string
my $end = "<TAG2>"; #End string
open my $DATA, '<', 'web1' or die $!;

undef $/;
$_=<$DATA>;
close $DATA;

@list = m/\b$input\b(.*?)\b$end\b/mg;
map { print "found : $_\n" } @list; 

Desired Result:

CODE

http://i.mg.com/00/s/MTIwMFgxNjAw/z/~5UAAOxyoA1RThdw/$T2eC16d,!ygE9s7HKL!RBRThdwL2fQ~~60_3.JPG
http://i.mg.com/00/s/MTIwMFgxNjAw/z/FLQAAOxy14VRTheJ/$T2eC16Z,!zEE9s3!YlV+BRTheJOOu!~~60_3.JPG
http://i.mg.com/00/s/MTIwMFgxNjAw/z/BMoAAMXQySpRTheV/$T2eC16N,!yEE9s5jE,jcBRTheUuoz!~~60_3.JPG 

--- You must not fight too often with one enemy, or you will teach him all your tricks of war.

RE: extracting text between two tags in a very long string

Hi

Quote (man perlre)

"\b" still means to match at the boundary between "\w" and "\W"
In your string the word and non-word characters are spread like this :

null},{<TAG1>http://

So right before "<TAG1>" there is no change from word to non-word, so there is not a word boundary.

Just remove the \b assertion and will work.

Feherke.
http://feherke.github.com/

RE: extracting text between two tags in a very long string

(OP)
perfect thanks!

--- You must not fight too often with one enemy, or you will teach him all your tricks of war.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close