Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

RegExp - Match 3 or more - does not work?? 1

Status
Not open for further replies.

MakeItSo

Programmer
Oct 21, 2003
3,316
DE
Hi guys,

I am stumped by an utterly simple RegEx! [3eyes]

This is what I am trying to do:
I am transforming a text file into XML for further editing.
In this process, I want to protect certain codes, that are all roughly like this: 0B1234.
They may be all digits or digits and upper case characters mixed. They may be 4 characters long or 5 or 6 or more.

But they are all digits and upper case chars only, and all are followed by a space.
This is how I try to catch them:
Code:
RE.Pattern = "([0-9A-Z]{3;})( )"
Simple: match all combinations of digits and upper case char that are at least 3 chars long.
Guess what: it doesn't work!
Why?
[ponder]

I have Regular Expression Laboratory to test such things and it too will say NAY.
I don't get it!

I can only do greedy or lazy matches but not "at least 3".

Thanks for any hint!

Cheers,
MakeItSo

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
>RE.Pattern = "([0-9A-Z]{3;})( )"
[tt]RE.Pattern = "([0-9A-Z]{3[highlight],[/highlight]})( )"[/tt]
 
Hi Tsuji,

nope, that's not it. First thing I tried. I've a German OS, so semicolon should be the correct delimiter, tried comma nonetheless but to no avail.
:-(

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
It is hard to believe. I doubt it. I don't argue with it.
 
>Guess what: it doesn't work!
What does it mean by "It doesn't work?".
 
I know it's weird. That's why I don't understand it.
I thought it might be a bug in my Tester tool, but even when single-stepping through my app and calling my regex function with a string like "March 2012", the above pattern should match 2012 and it DOESN'T!
Here's a larger piece of my code:
Code:
Function Variablize(What As String) As String
Dim RE As RegExp, MC As MatchCollection, MA As Match, Tag As String
Dim k As Integer

Set RE = New RegExp
With RE
    .Global = True
    .IgnoreCase = False
    .MultiLine = True
End With
[green]
'********
'Several other RegExes here, all work fine
'********
[/green][red]
RE.Pattern = "([0-9A-Z]{3;})( )"
Set MC = RE.Execute(What)
For Each MA In MC
    'Do my stuff
    'This block is never entered!!!
Next MA[/red]

Anything wrong with that??

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
What does it show?
[tt]
's=What
s="adfasdf34AB asdfsadf"
RE.Pattern = "([0-9A-Z]{3[highlight],[/highlight]})( )"
if re.test(s) then
Set MC = RE.Execute(s)
For Each MA In MC
'Do my stuff
'This block is never entered!!!
Next MA
else
msgbox "re.test(s)=false"
end if
[/tt]
 
Forget it. Now it works.
[3eyes]

What the... I ran it over the same frigging file at least 20 times yesterday, couldn't get it to work properly; with semicolon, with comma...
Now after I changed it back to comma once more, it worked!
[hairpull]

Never mind.
RE.Pattern = "([0-9A-Z]{3[highlight],[/highlight]})( )"
works fine now.

[banghead]

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
>so semicolon should be the correct delimiter

Really? I wasn't aware that RegEx syntax varied depending on the language. Can you tell me where I can find this documented, as it becomes important for any international software I might be writing.

>"March 2012"

Doesn't end in a space, so no match would be expected.

Suffice it to say that a) adding a space and b) using a comma - as I'd expect - rather than a semicolon makes the expression work (or it does for me)
 
Hi strongm,

the 2012 string contained a space after 2012 so that wasn't it.

Concerning comma vs. semicolon: that erroneous assumption of mine was a remnant of a thread about a Word placeholder search where Tony jollans posted
TonyJollans (Programmer)
23 Jul 08 5:16
MakeItSo,

I believe you are in Germany; if you have German settings on your system then the separator character will be, I think, a semi-colon instead of a comma:

.Text = "[0-9a-zA-Z&]{1;}_[0-9a-zA-Z&_]{1;}"

Enjoy,
Tony
Which solved my problems back then.
Naturally I assumed I needed a semicolon here...



[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
That is still very doubtful. But, again I won't argue.
 
Hi tsuji,

it's not doubtful, it's Word! [tongue]

Although it looks like a regular expression, it obviously is one of a different flavour...

In Word, placeholder searches with regular expressions really behave that way!

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
It sounds like an idiot programmed that part of Word ... the comma is clearly not being used as a list seperator and certainly shouldn't be affected by regional settings ... but I'd guess it has now become a "won't fix" feature.

Fortunately the proper RegExp library doesn't do anything as foolish.
 
Exactly! Imagine a function in a macro f(x,y) becomes f(x;y), if they push the idea thoroughly and consistently!
 
Micro$oft said:
About using list separators in regular expressions

The previous example uses the following argument to find either one-digit or two-digit dates: {1,2}. In this case, a comma separates the two values. However, remember that your regional settings in Microsoft Windows® control the list separator that you use. If your regional settings specify the use of semicolons as list separators, you must use them instead of commas.
That is exactly what I was referring to.
tsuji said:
It sounds like a sure recipe to create trouble and break cross-cultural use of word documents!
It sure is! It is a pain in the neck if you want to distribute QA macros containing Regex searches in Word!

strongm said:
It sounds like an idiot programmed that part of Word
Not only THAT part... [rednose]
Ever taken a closer look at the RTF code of Word document with fields, bookmarks and comments? Arbitrary line breaks anyone? Formatting code or line breaks in the frigging middle of a field code?

Speaking of programming:
ballmer_peak.png


[tongue]

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top