HI, I have spent many hours in Tektips and other websites learnign about Regex. Thanks!
I have basically 2 types of data that I want to parse out of an html (txt) file.
string 1: <title>1965 Corvette for sale</title>
String 2: <meta name="Keywords" content="Corvette, 1965, Cool Car">
Expected Result 1: 1965 Corvette for sale
Expected Result 2: Corvette, 1965, Cool Car
Actual Result 1: <title>1965 Corvette for sale
Actual Result 2: <meta name="Keywords" content="Corvette, 1965, Cool Car
Regexp used on 1: <title>[^<]+(?=</title>)
Regexp used on 2: <meta name="Keywords" content="[^<]+(?=">)
So some strings with Tags and some without. As you can see the beginiing match always "saves".
PROBLEM: The only
problem I am having is that I can't get rid of the beginning of the match (ie <title> above)
I have tried many variations on the regexp itself to no avail. ANy help is much appreciuated:
CODE:
Set fso = New FileSystemObject
Set tsMyFile = fs
penTextFile(PUBTxtInputFile, ForReading)
Do Until tsMyFile.AtEndOfStream
Set re = New RegExp
With New RegExp
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = "<title>[^<]+(?=</title>)"
For Each myMatch In .Execute(tsMyFile.ReadLine)
PUBScrapedText = myMatch.Value
Next
End With
DoEvents
Loop
'PUBScrapedText returns the output (ie <title>1965 Corvette for sale) that I save in a table in the db.
I have basically 2 types of data that I want to parse out of an html (txt) file.
string 1: <title>1965 Corvette for sale</title>
String 2: <meta name="Keywords" content="Corvette, 1965, Cool Car">
Expected Result 1: 1965 Corvette for sale
Expected Result 2: Corvette, 1965, Cool Car
Actual Result 1: <title>1965 Corvette for sale
Actual Result 2: <meta name="Keywords" content="Corvette, 1965, Cool Car
Regexp used on 1: <title>[^<]+(?=</title>)
Regexp used on 2: <meta name="Keywords" content="[^<]+(?=">)
So some strings with Tags and some without. As you can see the beginiing match always "saves".
PROBLEM: The only
I have tried many variations on the regexp itself to no avail. ANy help is much appreciuated:
CODE:
Set fso = New FileSystemObject
Set tsMyFile = fs
Do Until tsMyFile.AtEndOfStream
Set re = New RegExp
With New RegExp
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = "<title>[^<]+(?=</title>)"
For Each myMatch In .Execute(tsMyFile.ReadLine)
PUBScrapedText = myMatch.Value
Next
End With
DoEvents
Loop
'PUBScrapedText returns the output (ie <title>1965 Corvette for sale) that I save in a table in the db.