×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

Webscrape vfp9
3

Webscrape vfp9

Webscrape vfp9

(OP)
* Want to capture the data using this so I can do some "Machine Learning" using R.
* would like to get guidance if I can, Lui
DECLARE INTEGER ShellExecute IN shell32.dll ;
INTEGER hndWin, ;
STRING cAction, ;
STRING cFileName, ;
STRING cParams, ;
STRING cDir, ;
INTEGER nShowWin
cFileName = "https://www.investing.com/commodities/real-time-fu..."
cAction = "OPEN"
ShellExecute(0,cAction,cFileName,"","",1)
*Can view the web page view with the above and
*Want to capture that table and scrape it to
*my database table called "investorprices",*FIELDS: id_pk, commodityname,commoditymonth,last,high,low,
*change,changepercent,time,RecordTs

* On the view page source are lines 1505 to 2018
*<tbody>
* <tr>
* <td class="flag"><span title="" class="ceFlags gold">&nbsp;</span></td>
* <td class="bold left plusIconTd noWrap elp"><a title="Gold Futures" href="/commodities/gold">Gold
* </a><span class="alertBellGrayPlus js-plus-icon genToolTip oneliner" data-tooltip="Create Alert"
* data-name="Gold Futures" data-id="8830"></span></td>
* <td class="left noWrap">Aug 19 </td>
* <td class="pid-8830-last">1,422.85</td>
* <td class="pid-8830-high" >1,430.45</td>
* <td class="pid-8830-low" >1,414.70</td>
* <td class="bold redFont pid-8830-pc" >-4.05</td>
* <td class="bold redFont pid-8830-pcp" >-0.28%</td>
* <td class="pid-8830-time" data-value="1563902321" >13:18:41</td>
* <td class="icon"><span class="greenClockIcon">&nbsp;</span></td>
* </tr>
* .
* . 35 different items with this type of data each separated by <tr>...<tr>
* .
*<tbody>

RE: Webscrape vfp9

Opening the URL in ShellExecute() will merely display the page in your default web browser. It doesn't give you programmatic access to the underlying code. To do that, you need to actually download the page into a variable.

There are several ways of doing that. One is to use the Internet Transfer control. Somehting like this:

CODE -->

oNet =  CREATEOBJECT("InetCtls.Inet.1")
lcPage = oNet.OpenURL("https://www.investing.com/commodities/real-time-futures") 

lcPage will now contain the page's underlying HTML. You can now use VFP's full reportoire of string-handling functions to extract the actual content that you want. For example to extract everything between two given tags, you might use STREXTRACT().

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

RE: Webscrape vfp9

3
Luikj,

If the HTML page is prepared and transformed into an XML document, VFP's XMLTOCURSOR() function can be used to store the downloaded data and then processed as you want.

The all process can be done with a) some simple VFP functions to clean up the HTML and turn it into valid XML, and b) an XSL transformation that will rearrange the data so that XMLTOCURSOR() may be able to consume it.

CODE --> VFP

LOCAL HTTP AS WinHttp.WinHttpRequest

* read the page from the website
m.HTTP = CREATEOBJECT("WinHttp.WinHttpRequest.5.1")
m.HTTP.Open("Get", "https://www.investing.com/commodities/real-time-futures", .F.)
m.HTTP.Setrequestheader("Content-type", "text/html")
* signal "I'm a browser", otherwise the server will not allow
m.HTTP.Setrequestheader("User-Agent", "Mozilla/VFP")
m.HTTP.Send()

LOCAL HtmlPage AS String
LOCAL CrossRateTable AS String

* get the page in HTML
m.HtmlPage = m.HTTP.Responsetext
* isolate the body of the cross rate table
m.CrossRateTable = STREXTRACT(STREXTRACT(m.HtmlPage, ' id="cross_rate_1"', ""), "<tbody>", "</tbody>", 1, 4)

* remove non-break-spaces (XML does not know what this is)
m.CrossRateTable = STRTRAN(m.CrossRateTable, "&" + "nbsp;", " ")
* now we have an XML clean document

LOCAL XML AS MSXML2.DOMDocument60
LOCAL XSL AS MSXML2.DOMDocument60

LOCAL XSLCode AS String

* prepare the XML document processor and its transformer
m.XML = CREATEOBJECT("MSXML2.DOMDocument.6.0")
m.XSL = CREATEOBJECT("MSXML2.DOMDocument.6.0")

* the transsformation code
TEXT TO m.XSLCode NOSHOW
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

  <xsl:output method="xml"/>

  <xsl:template match="/">
    <xsl:element name="VFPData">
      <xsl:apply-templates select="tbody/tr"/>
    </xsl:element>    
  </xsl:template>

  <xsl:template match="tr">
    <xsl:element name="crossrate">
      <xsl:element name="commodity">
        <xsl:value-of select="td[2]/a"/>
      </xsl:element>
      <xsl:element name="month">
        <xsl:value-of select="td[3]"/>
      </xsl:element>
      <xsl:element name="last">
        <xsl:value-of select="translate(td[4], ',', '')"/>
      </xsl:element>
      <xsl:element name="high">
        <xsl:value-of select="translate(td[5], ',', '')"/>
      </xsl:element>
      <xsl:element name="low">
        <xsl:value-of select="translate(td[6], ',', '')"/>
      </xsl:element>
      <xsl:element name="change">
        <xsl:value-of select="translate(td[7], ',', '')"/>
      </xsl:element>
      <xsl:element name="change_percent">
        <xsl:value-of select="translate(td[8], ',%', '')"/>
      </xsl:element>
      <xsl:element name="time">
        <xsl:value-of select="td[9]"/>
      </xsl:element>
    </xsl:element>
  </xsl:template>
</xsl:stylesheet>
ENDTEXT

* load the source
m.XML.LoadXML(m.CrossRateTable)
* and the transformer
m.XSL.LoadXML(m.XSLCode)

* transform and put the result inside a cursor
XMLTOCURSOR(m.XML.TransformNode(m.XSL), "CrossRateTable")

BROWSE 

The result (as of today):

RE: Webscrape vfp9

António,

This example is worth more than a normal Star. Very good. And thank you for sharing.

Koen

RE: Webscrape vfp9

Thank you, Koen, those are very kind words.

RE: Webscrape vfp9


Another way, using webbrowser control ( this extracts all the tables ):


CODE -->

************************************************
* Extract tables from html
* Marco Plaza , 2019 ( https://github/nfTools )
* no error management check etc..
************************************************

curl =  "https://www.investing.com/commodities/real-time-futures"

owbf = Createobject("webbrowser")


With owbf.wbc

	.silent = .T.
	.Navigate(m.curl)

	Do While .readystate # 4 Or .busy
		Wait 'contacting server.. Esc to cancel' Window Nowait
		if inkey(0.2) = 27 and messagebox('cancel?',4) = 6
			cancel
		endif
	Enddo

	tc=.Document.getelementsbytagname('table')

Endwith

Messagebox('Found '+Transform(tc.Length)+' tables ',0)

ntable  = 0

For Each Table In tc

	Set Textmerge To Memvar Xmltable Noshow
	Set Textmerge Delimiters To "<%","%>"
	Set Textmerge On

\<table>

	crows=Table.getelementsbytagname('tr')

	For Each tablerow In crows
	\<row>
		ccells=tablerow.getelementsbytagname('td')
		coln = 0
		For Each cell In ccells
			coln=coln+1
			ccn=Transform(coln,'@l 99')
			\<<%"col"+m.ccn%>><%cell.InnerText%></<%"col"+m.ccn%>>
		Endfor
	\</row>
	Endfor

\</table>

	Set Textmerge Off
	Set Textmerge To

	ntable = m.ntable+1

	Xmltable = Strtran( m.xmltable,'&','&amp;')

	tname = 'Table_'+Transform(m.ntable)
	Xmltocursor( m.xmltable,m.tname,4)

	Browse Normal Font 'consolas',14

Endfor


********************************************

Define Class webbrowser As Form
	Add Object wbc As se2
Enddefine

Define Class se2 As OleControl
	OleClass ='shell.explorer.2'
Enddefine 

Marco Plaza
@nfoxProject
https://www.github.com/nftools

RE: Webscrape vfp9

Marco, at that stage you could make use of Tamar Granor's advice on how HTML tables are easily transferred to Excel. I just don't find the thread we once had. Must have been years ago and I think it was about the fastest way to export to excel as XLSX was new and/or COPY TO TYPE XL5 didn't suffice.

FAQ184-3005: Export (groups of) DBFs to Excel Workbook QUICKLY describes how to export DBF to HTML tables and then open them in Excel. You can skip the step to create the HTML table, as you come from there.

The step to get from Excel to DBF then remains, but if the data is just numeric, that should be easy.

So in summary: Once you are at HTML tag <table> you can do

CODE

CD d:\temp && wherever you want to save the separated HTML tables

LOCAL lcHTMLTable, lcHTMLFile

ntable  = 0
For Each Table In tc
  lcHTMLTable = Table.OuterHTML

Text to lcHTMLFile textmerge noshow 
<HTML>)
    <HEAD>
    <META HTTP-EQUIV='Content-Type' CONTENT='text/html; charset=windows-1252'>
    <META NAME='Generator' CONTENT='VFP'>
    <TITLE>a</TITLE>
    </HEAD>
    <BODY><<lcHTMLTablel>></BODY>
</HTML>
EndText

ntable = ntable + 1 
StrToFile(lcHTMLFile,"table"+transform(ntable)+".html")
Endfor 

Then open the HTML files with Excel and grab your data from there.

Bye, Olaf.

Olaf Doschke Software Engineering
https://www.doschke.name

RE: Webscrape vfp9


Hi Olaf, not sure why you see the need for that..
( the routine already gets each table data in dbf format )




Marco Plaza
@nfoxProject
https://www.github.com/nftools

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close