×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Getting Text off of a Website

Getting Text off of a Website

Getting Text off of a Website

(OP)
Perl Monks!

I am very new to Perl and am trying to create a script that will allow me to download my homework assignments off of my teacher's website for a specific day. He puts our HW on his website, http://staweb.sta.cathedral.org/departments/math/m.... I would like to make a script that when given a date, finds the corresponding assignment and prints it in a blank text file. I am able to create all of the mechanics except for the copying the assignment part

I have been able to use LWP::Simple to find the text, but don't know how to make the script choose the corresponding assignment. Nor do I know how to print that into a blank text file. I don't think this is very complicated, but I'm really bad at Perl, so any/all help would be appriciated!

RE: Getting Text off of a Website

Are you still stuck on this? What is your code so far?

Annihilannic
tgmlify - code syntax highlighting for your tek-tips posts

RE: Getting Text off of a Website

I looked at the web page's source.
This one looks like a real chore to pull out the sections with regex's.

Do not worry about getting it into a file until you get it to work. print "$blah"; will let you debug without having to peek inside your new file.

This page is "unique" in a sense, since it follows a strict pattern.
One (of many) ways might be to read the web page line by line.
If it matches <tr at the beginning, start to concatenate a variable ($cool .= $line) until a line matches </tr at beginning. Then push $cool into an array or just skip to next below.

Then you can pull out (with a regex) the date section and the HW section.

If date is correct, print that into your file. Done.

look at:
perldoc perlrequick
perldoc perlretut
perldoc perlfaq6
perldoc perlre
perldoc perlrebackslash
perldoc perlrecharclass
perldoc perlreref

and
perldoc -f open

RE: Getting Text off of a Website

I would probably make use of HTML::TableExtract to break the html up before considering using other methods i.e. regexes to extract the specific elements. An alternative or combo would be to use HTML::TreeBuilder / HTML::Element which have html lookdown and address methods. From the supplied webpage I can immediately see common groups i.e. each dates container cell has a width of 10% and each descripions container cell has a width of 85% etc etc etc.

Chris

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close