Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to extract content from multiple html pages on local drive? 3

Status
Not open for further replies.

robinsf

Programmer
Joined
Nov 30, 2004
Messages
7
Location
US
I have a large number of HTML pages sitting on my local drive that I created using a database. The database was recently corrupted and I'm looking for a way to rebuild it. Is there a way, using CF, to extract the data from the html pages and load the data to a database in a batch process? Below is a stripped down version of one of the html pages. I need to be able to gather player name, team, points and other league data and load it into the new database.

thanks,

***********************************************
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "<html xmlns="<head>
<title>Player Data</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<link rel="stylesheet" href="../screen.css" type="text/css" />
</head>

<body>


<table id="playerdata" cellpadding="0" cellspacing="0">
<tr>
<td class="datalabel">Name:</td>
<td class="data">Roger Smith</td>
</tr>
<tr>
<td class="datalabel">Team:</td>
<td class="data"> Blue Badgers</td>
</tr>
<tr>
<td class="datalabel">Season Points:</td>
<td class="data">95</td>
</tr>
<tr>
<td class="datalabel">Other Leagues:</td>
<td class="data">Hawthorne; Greendale;</td>
</tr>
</table>

</body>
</html>
 
well if you wanted to use cf you'd have to use cfdirectory to get a list of files. Loop through the query created by cfdirectory. use cffile action = "read" for the current file in the loop and update the db with the contents of the file. not exactly what CF was built for but it *could* be done.

A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools.
-Douglas Adams (1952-2001)
 
I've used cfdirectory and cffile to loop through the directory and read the files into a variable. How do I parse out only the data I want?

thanks,
 
This is where it gets fun..

You'll want to look up

Find()

Mid() - (standard syntax)

And you'll want to Find at what position some text prefixing the first field occurrs and then find at what position text that follows it occurs and then use mid to trap in between.

The difficult thing sometimes if finding text that always prefixes and always follows the text you want to trap but is text that's not going to be IN your text.

A complicated Regular Expression can do this to, but I've seen such expressions and they're very hard to write and still don't work perfect. I had a similar situation and asked for help in a forum and the regex they gave me got like 650 out of 680 correct (can't beat that).

If I still had that regex, I'd share it. But I don't think I do.

ALFII.com
---------------------
If this post answered or helped to answer your question, please reply with such so that forum members with a similar question will know to use this advice.
 
thread232-139290

I found it.. and still its very beautiful code... I'm not sure this is the exact post I was looking for but maybe it will help.

Code:
<!--- Read Text File --->
<CFFILE FILE="c:\folders\dev.txt" ACTION="READ" VARIABLE="vDev">

<!--- Set Deliminator to convert text file to list --->
<CFSET vDelim = "|">

<!--- Put the deliminator in front of every "<a name" --->
<CFSET vDev = ReplaceNoCase(VARIABLES.vDev, "<a name", VARIABLES.vDelim & "<a name", "ALL")>

<!--- Loop though the text file as if it were a list --->
<CFLOOP LIST="#VARIABLES.vDev#" INDEX="i" DELIMITERS="#VARIABLES.vDelim#">
    <!--- Get the book using regular expressions and put it into a variable called vBook --->
    <CFSCRIPT>
        vBook = ReReplace(i, "(.+)(--)([[:digit:]]?[^[:digit:]]*)([[:digit:]]+)(:)([[:digit:]]+)(.+)", "\3-\4-\6");
        vBook = Replace(VARIABLES.vBook, " ", "", "ALL") & ".vinc";
    </CFSCRIPT>
    <CFFILE FILE="c:\folders\#VARIABLES.vBook#" ACTION="WRITE" OUTPUT="#i#" ADDNEWLINE="No">
</CFLOOP>

ALFII.com
---------------------
If this post answered or helped to answer your question, please reply with such so that forum members with a similar question will know to use this advice.
 
pretty slick stuff. that'll come in handy too if you're trying to parce non-xml data comming from another website using cfhttp.

A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools.
-Douglas Adams (1952-2001)
 
Holy "That's some kick a$$ code!", Batman! [shocked]

You get a star!!



Hope This Helps!

Ecobb
Beer Consumption Analyst

"My work is a game, a very serious game." - M.C. Escher
 
Yeah, and its over TWO YEARS OLD. Wow.

Tliesh is the one who gave me this code, wish he still browsed the forums, but if he did, not sure any of us would stack up in coding ability against him. That guy was this good two years ago, wow.

No slight intended, the skills here are respected.

ALFII.com
---------------------
If this post answered or helped to answer your question, please reply with such so that forum members with a similar question will know to use this advice.
 
none taken. there is always someone better!


A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools.
-Douglas Adams (1952-2001)
 
You rock webmigit! The code is exactly what I needed. This forum is great. I posted this question to experts exchange last week and couldn't get a solution. I post it here and I get a solution in two days.

Thanks for the help.
 
I only wish that was mine though... It is beautiful code.

tliesh deserves the credit.

ALFII.com
---------------------
If this post answered or helped to answer your question, please reply with such so that forum members with a similar question will know to use this advice.
 
well, tliesh isn't here so you'll have to receive the awards on tliesh's behalf. ;)

A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools.
-Douglas Adams (1952-2001)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top