INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Scrape data from web site using VBA

Scrape data from web site using VBA

(OP)
I am trying to loop through a list of NFL players and extract stats from a website. I originally was using the InternetExplorer.application object but it was waaaay too slow and often run into problems with the page loads. After searching online I think using MSXML2.XMLHTTP60 is a quicker, cleaner way to get the data but I am having trouble parsing the html code to find what I want.

I have verified that the html is being returned correctly (I printed it), but when I try to use getElementsByTagName it doesn't find the tags. I'm new to XMLHTTP60, but I followed the steps I found on multiple sites of assigning the .repsonseText to an HTMLBody object. I would assume that would allow me to use .getElementsByClassName and .getElementsByTagName like I would with InternetExplorer.Document. But even though I am using the same code from these working examples, I can't get the code to see the tags.

CODE --> VBA

Public Sub test()
    
    Dim IE As MSXML2.XMLHTTP60
    Dim HTMLDoc As MSHTML.HTMLDocument
    Dim HTMLBody As MSHTML.HTMLBody
    Dim www As String, txt As String
    Dim nme
    Dim objCollection As Object
    Dim n As Integer
   
    www = "http://www.pro-football-reference.com/search/search.fcgi?search="
    'this is an array of names read from Ecxcel
    nme = Cells(1, 1).CurrentRegion
    
    Set IE = New MSXML2.XMLHTTP60
    
    'Loop through all the names
    For x = 1 To UBound(nme)
    
        'search the website by name
        IE.Open "GET", www & nme(x, 1), False
        IE.send
        While IE.ReadyState <> 4
            DoEvents
        Wend

        Set HTMLDoc = New MSHTML.HTMLDocument
        Set HTMLBody = HTMLDoc.body
        HTMLBody.innerHTML = IE.responseText
        
        'The error occurs with the following line. No title tags are found (but there really is one) 
        'and so the obCollection is not set.  I have verified that the title tag exists as I printed it 
        'and saw it and using Split(IE.responseText,"<title>") correctly gives an array of two elements (i.e. one title tag).

        'extract the title tag to verify that the player page was returned
        Set objCollection = HTMLBody.getElementsByTagName("title")
        
        'NOTHING IS RETURNED FORM THE ABOVE LINE AND SO THE NEXT LINE ERRORS "Object variable not set"
        txt = objCollection(0).innerText        
        If txt Like "*" & nme(x, 1) & "*" Then
            'code for player found
        Else
            'code for player not found
        End If
        
    Next x
End Sub 

RE: Scrape data from web site using VBA

HI,

In the Excel Ribbon, Data > Get external data > From Web

Paste the URL into the Address box. The site will be accessed. Scroll around to find the area of data that interests you. Notice the little YELLOW boxes, and the associated instruction to "Click [box] next to the tables you want to select, then click import."

This will put the data referenced on your sheet, where you can search for specific data.

Use your macro recorder to generate code that can then be modified as necessary.

Skip,

glassesJust traded in my OLD subtlety...
for a NUance!tongue

RE: Scrape data from web site using VBA

(OP)

I appreciate the reply, unfortunately that doesn't help me.

When I do that there are no little yellow boxes to click. Secondly, that seems like it is opening up and loading the web page in a browser which is what I am trying to avoid due to my issues with page loading time errors and just plain taking too long. Lastly the table that I need to download alters by player and I won't know what it is called until the html is shown, so I need a dynamic way to search the page to find the specific table for each player.

I already have a way to extract the html quickly without loading the page to a browser. ANd it includes all of the data I need. I can even parse it using the Split function but that is messy and ineffecient. All I need is a way to parse the data using getElementsByTagName and getElementsByClass, which from everything I read should be easy, but I can't get it to work.

So the question I need answered is once I have retrieved the html code using IE As MSXML2.XMLHTTP60 object, what do I with that object be able to use getElementsByTagName and getElementsByClass? I thought is was this.

CODE --> VBA

Set HTMLDoc = New MSHTML.HTMLDocument
Set HTMLBody = HTMLDoc.body
HTMLBody.innerHTML = IE.responseText 


RE: Scrape data from web site using VBA

I've had this problem as well...

The thing is, the Title tag is in the Header of the document, and when you put the website in the HTMLBody element, you only get the BODY of the page....

I don't remember off hand how I handled that...

One way that might work, is to use the .createDocumentFromUrl method from MSHTML... that will put the entire webpage in the HTMLDoc element... then you can access the HTMLDoc.head.

Gcomyn

RE: Scrape data from web site using VBA

> opening up and loading the web page in a browser which is what I am trying to avoid

Except that's basically what your code does, since the XMLHTTP Object uses WinInet (i.e. Internet Explorer) for it's functionality (it just doesn't render it). SO maybe you have been suffering rendering [problems, as any page download problems should also affect the XMLHTTP object.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close