Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Html Extraction from WebBrowser Ctrl

Status
Not open for further replies.

Vec

IS-IT--Management
Jan 29, 2002
418
US
This code:
Code:
WebBrowser1.Document.body.innerHTML
Returns the body of the html doc. How do I get the whole doc, meta tags and all? -------------------------------------------------------------------------
-------------------------------------------------------------------------

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Univer
 

Use advanced search. I posted this a couple of times as so has others. Use all words webbrowser source any date in your forums and you will find the answer about 4 or 5 down.

Good Luck

 
And here's one way of doing it without a Web Browser control (you'll need a reference to the Microsoft HTML OBject library:
[tt]
Dim DocumentFactory As HTMLDocument
Dim myHTMLDoc As HTMLDocument

Set DocumentFactory = New HTMLDocument

Set myHTMLDoc = DocumentFactory.createDocumentFromUrl(" "")
Do Until myHTMLDoc.readyState = "complete"
DoEvents
Loop
Debug.Print myHTMLDoc.documentElement.outerHTML
 

Well strongm, I was pointing Vec to your answer that comes up with the search, but that works also.

:)

 
It works almost perfect, exept it does not pull the content form the description and keywords meta tags. ANyone see the bug?

TextBox named: txtOutput set to multiline
TextBox named: txtHtmlCode set to multiline
Timer named: tmrWebBrowser set to enabled = true. 500 interval
Command Button named: cmdStart
WebBrowser Component named: WebBrowser


'The Code

Code:
Private Sub Form_Load()
Code:
WebBrowser.Navigate "[URL unfurl="true"]http://www.audio-warehouse.com"[/URL]
Code:
End Sub

Code:
Private Sub cmdStart_Click()
Code:
Dim WebPageCode
Code:
WebPageCode = WebBrowser.Document.firstChild.OuterHTML
Code:
txtHtmlCode.Text = WebPageCode

Code:
Dim strHTML As String, lPointer As Long, MetaEnd As Long
Code:
Dim Start As Long, Finish As Long, strName As String
Code:
Dim PageTitle As String, PageDescription As String, PageKeyWords As String

Code:
strHTML = UCase$(txtHtmlCode.Text)
Code:
lPointer = InStr(strHTML, &quot;<TITLE>&quot;)
Code:
If lPointer > 0 Then
Code:
Start = lPointer + 7
Code:
Finish = InStr(strHTML, &quot;</TITLE>&quot;)
Code:
If Finish > Start Then
Code:
PageTitle = Mid$(txtHtmlCode.Text, Start, Finish - Start)
Code:
End If
Code:
End If
Code:
lPointer = 1
Code:
Do
Code:
lPointer = InStr(lPointer, strHTML, &quot;<META&quot;)
Code:
If lPointer = 0 Then Exit Do
Code:
MetaEnd = InStr(lPointer, strHTML, &quot;>&quot;)

Code:
lPointer = InStr(lPointer, strHTML, &quot;NAME&quot;)
Code:
lPointer = InStr(lPointer, strHTML, &quot;=&quot;)

Code:
Start = InStr(lPointer, strHTML, &quot;&quot;&quot;&quot;) + 1
Code:
Finish = InStr(Start, strHTML, &quot;&quot;&quot;&quot;)

Code:
If Finish > MetaEnd Or Finish < Start Then Exit Do
Code:
strName = Mid$(strHTML, Start, Finish - Start)
Code:
strName = Trim$(strName)

Code:
lPointer = Finish
Code:
lPointer = InStr(lPointer, strHTML, &quot;CONTENT&quot;)
Code:
lPointer = InStr(lPointer, strHTML, &quot;=&quot;)

Code:
Start = InStr(lPointer, strHTML, &quot;&quot;&quot;&quot;) + 1
Code:
Finish = InStr(Start, strHTML, &quot;&quot;&quot;&quot;)

Code:
If Finish > MetaEnd Or Finish < Start Then Exit Do
Code:
Select Case strName
Code:
Case &quot;DESCRIPTION&quot;
Code:
PageDescription = Mid$(txtHtmlCode.Text, Start, Finish - Start)
Code:
Case &quot;KEYWORDS&quot;
Code:
PageKeyWords = Mid$(txtHtmlCode.Text, Start, Finish - Start)
Code:
End Select

Code:
If Len(PageDescription) > 0 And _
Code:
Len(PageKeyWords) > 0 Then Exit Do
Code:
lPointer = Finish
Code:
Loop
Code:
PageTitle = Trim$(PageTitle)
Code:
PageDescription = Trim$(PageDescription)
Code:
PageKeyWords = Trim$(PageKeyWords)

Code:
txtOutput.Text = &quot;<Title>&quot; + PageTitle + &quot;</Title>&quot; & vbCrLf & _
[code]&quot;<META NAME=&quot; + Chr(34) + &quot;Description&quot; + Chr(34) + &quot; CONTENT=&quot; + Chr(34) + PageDescription + Chr(34) + &quot;>&quot; & vbCrLf & _
Code:
&quot;<META NAME=&quot; + Chr(34) + &quot;Keywords&quot; + Chr(34) + &quot; CONTENT=&quot; + Chr(34) + PageKeyWords + Chr(34) + &quot;>&quot;
Code:
End Sub

Code:
Private Sub tmrWebBrowser_Timer()
Code:
If WebBrowser.Busy = False Then
Code:
cmdStart.Enabled = True
Code:
tmrWebBrowser.Enabled = False
Code:
End If
Code:
End Sub
-------------------------------------------------------------------------
-------------------------------------------------------------------------

&quot;Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Univer
 
Boy, we know from thread222-509531 that you aren't really interested in help files or MSDN, but it's beginning to look like you don't even read the replies you are given here!

Threads thread222-514080, thread222-513378 and thread222-512711, all responses to your queries on this subject, contain the info you need. The Instr solution is a red herring, because it was a solution for parsing text. But your source is actually an HTML document, so you don't need to go to those lengths. The following requires a reference to the Microsoft HTML Object library
[tt]
Option Explicit

Private Sub Command1_Click()
txtHTMLcode.Text = vecFunction(&quot; True) ' tek-tips will put semi-colons after the URL, so you need to remove them
txtOutput.Text = vecFunction(&quot; False)
End Sub

Private Function vecFunction(strURL As String, Optional Parse As Boolean = True) As String
Dim DocumentFactory As HTMLDocument
Dim myHTMLDoc As HTMLDocument

Dim MetaTag As IHTMLElement
Dim MetaTagCollection As IHTMLElementCollection

' This from THIS thread
Set DocumentFactory = New HTMLDocument

Set myHTMLDoc = DocumentFactory.createDocumentFromUrl(strURL, &quot;&quot;)
Do Until myHTMLDoc.readyState = &quot;complete&quot;
DoEvents
Loop

' From thread222-513378 and Thread222-512711
Set MetaTagCollection = myHTMLDoc.documentElement.getElementsByTagName(&quot;title&quot;)
If Parse Then
vecFunction = vecFunction + MetaTagCollection(0).outerText
Else
vecFunction = vecFunction + MetaTagCollection(0).outerHTML
End If

Set MetaTagCollection = myHTMLDoc.documentElement.getElementsByTagName(&quot;meta&quot;)

For Each MetaTag In MetaTagCollection
If InStr(UCase(&quot;Description KeyWords&quot;), UCase(MetaTag.Name)) And MetaTag.Name <> &quot;&quot; Then
If Parse Then
vecFunction = vecFunction + vbCrLf + vbCrLf + MetaTag.content
Else
vecFunction = vecFunction + vbCrLf + MetaTag.outerHTML
End If
End If
Next

End Function
[/tt]
of course, if you do want to do it with the webbrowser and Instr, you need to change

WebPageCode = WebBrowser.Document.firstChild.OuterHTML

to

WebPageCode = WebBrowser.Document.documentelement.outerHTML
 
I think we should go to this guy and ask a couple of questions.
First has any of the input so far been of any help?
Next is the whole document going to be saved locally for you to work on? Meaning are you wanting a way of capturing a web page and then walking through the various elements in order to develop your web pages?
Last if the only thing you want is to save a web page try this ....
Dim hElm As IHTMLElement
Dim htmltext As String
fso As New FileSystemObject
Dim file As Object

Set hElm = brwWebBrowser.Document.All.tags(&quot;html&quot;).Item(0)
htmltext = hElm.outerHTML

Set fso = CreateObject(&quot;Scripting.FileSystemObject&quot;)
Set file = fso_OpenTextFile(File1.Path & &quot;\&quot; & File_Name, 2, True)
file.Write htmltext
file.Close

Set file = Nothing
Set fso = Nothing

With the appropriate referencing you will be able to set a copy of the current wbebrowser page as a file locally on your hard drive.

If this is helpful then let us all know.
As it gets frustrating trying to follow some of the questioning with out a feedback.

NO FLAMING ALLOWED.....
NO HELP WILL FOLLOW.....

We all are trying our best to be the best we can be. Corny but true.

HTH --- Have a better day today than any other in your life.
 
Thanks guys,
In response to AntonV, I am not saving the whole doc, bsically I am constructing a meta tag extraction app to pull title, keywords and description. This is not an app for anyone else, or for sale, a client or any type of distribution. It is purly for myself and the learning curve. I have already learned many different things from all of you and I thank you for that. The reason I need to use a webrowser ctrl and not just Inet is becasue Inet gets forbidden pages a lot using Google. Example, place an inet on a form and draw the html from the inet to a text box, in the form load event, navigate to
Code:
www.google.com
, no problem right? Then change the nav to statement to
Code:
[URL unfurl="true"]http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=car+audio&btnG=Google+Search[/URL]
this example searches for car audio. Pulls the page right? Wrong! The html you are looking at is Googles Forbbiden Access Page. But you can push google searchs like this into a wb ctrl.

SO the big picture of what I am eventually trying to create is an app that will allow you to enter search words into a txt box, enter a number of pages to open in another text box (to open one after another contoled by the wb busy function) then read the meta info. (Just the title, keywords and desc)
If I can get it to that point, (which I will!) I can do many things with it, example, create logs of the top sites key words etc.
So that sums up my &quot;vision&quot;. I know you can easily buy commercial apps to do that and in fact I own one. but I obviously want to learn.

StrongM
Sorry if I ticked you off, I have been reading all of the posts, but a lot of what happens is that I follow the post or ref to another thread and either can't get it to work at all or can't get it to work right. I hope that my explanation above clears up what I am trying to do.
Be patient with me, I am not a pro programmer like you guys! In fact a year ago, I was impressed when I could use a command button and create a mxgbox by pressing it. Thought I was ready for the big times [smile] -------------------------------------------------------------------------
-------------------------------------------------------------------------

&quot;Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Univer
 
No, no, that's fine. As long as you're picking stuff up.

I just think that it may have been of benefit to all concerned to keep all those seperate threads on the same subject in one single thread. That way there'd have been a better picture about what you were really tring to do and, hopefully, as a result, a set of more and more appropriate responses to help you solve the issue.



 
I will keep them together in the future, sometimes I tend to wander. -------------------------------------------------------------------------
-------------------------------------------------------------------------

&quot;Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Univer
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top