Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

RegExp - Match 3 or more - does not work?? 1

Status
Not open for further replies.

MakeItSo

Programmer
Joined
Oct 21, 2003
Messages
3,316
Location
DE
Hi guys,

I am stumped by an utterly simple RegEx! [3eyes]

This is what I am trying to do:
I am transforming a text file into XML for further editing.
In this process, I want to protect certain codes, that are all roughly like this: 0B1234.
They may be all digits or digits and upper case characters mixed. They may be 4 characters long or 5 or 6 or more.

But they are all digits and upper case chars only, and all are followed by a space.
This is how I try to catch them:
Code:
RE.Pattern = "([0-9A-Z]{3;})( )"
Simple: match all combinations of digits and upper case char that are at least 3 chars long.
Guess what: it doesn't work!
Why?
[ponder]

I have Regular Expression Laboratory to test such things and it too will say NAY.
I don't get it!

I can only do greedy or lazy matches but not "at least 3".

Thanks for any hint!

Cheers,
MakeItSo

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
>RE.Pattern = "([0-9A-Z]{3;})( )"
[tt]RE.Pattern = "([0-9A-Z]{3[highlight],[/highlight]})( )"[/tt]
 
Hi Tsuji,

nope, that's not it. First thing I tried. I've a German OS, so semicolon should be the correct delimiter, tried comma nonetheless but to no avail.
:-(

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
It is hard to believe. I doubt it. I don't argue with it.
 
>Guess what: it doesn't work!
What does it mean by "It doesn't work?".
 
I know it's weird. That's why I don't understand it.
I thought it might be a bug in my Tester tool, but even when single-stepping through my app and calling my regex function with a string like "March 2012", the above pattern should match 2012 and it DOESN'T!
Here's a larger piece of my code:
Code:
Function Variablize(What As String) As String
Dim RE As RegExp, MC As MatchCollection, MA As Match, Tag As String
Dim k As Integer

Set RE = New RegExp
With RE
    .Global = True
    .IgnoreCase = False
    .MultiLine = True
End With
[green]
'********
'Several other RegExes here, all work fine
'********
[/green][red]
RE.Pattern = "([0-9A-Z]{3;})( )"
Set MC = RE.Execute(What)
For Each MA In MC
    'Do my stuff
    'This block is never entered!!!
Next MA[/red]

Anything wrong with that??

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
What does it show?
[tt]
's=What
s="adfasdf34AB asdfsadf"
RE.Pattern = "([0-9A-Z]{3[highlight],[/highlight]})( )"
if re.test(s) then
Set MC = RE.Execute(s)
For Each MA In MC
'Do my stuff
'This block is never entered!!!
Next MA
else
msgbox "re.test(s)=false"
end if
[/tt]
 
Forget it. Now it works.
[3eyes]

What the... I ran it over the same frigging file at least 20 times yesterday, couldn't get it to work properly; with semicolon, with comma...
Now after I changed it back to comma once more, it worked!
[hairpull]

Never mind.
RE.Pattern = "([0-9A-Z]{3[highlight],[/highlight]})( )"
works fine now.

[banghead]

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
>so semicolon should be the correct delimiter

Really? I wasn't aware that RegEx syntax varied depending on the language. Can you tell me where I can find this documented, as it becomes important for any international software I might be writing.

>"March 2012"

Doesn't end in a space, so no match would be expected.

Suffice it to say that a) adding a space and b) using a comma - as I'd expect - rather than a semicolon makes the expression work (or it does for me)
 
Hi strongm,

the 2012 string contained a space after 2012 so that wasn't it.

Concerning comma vs. semicolon: that erroneous assumption of mine was a remnant of a thread about a Word placeholder search where Tony jollans posted
TonyJollans (Programmer)
23 Jul 08 5:16
MakeItSo,

I believe you are in Germany; if you have German settings on your system then the separator character will be, I think, a semi-colon instead of a comma:

.Text = "[0-9a-zA-Z&]{1;}_[0-9a-zA-Z&_]{1;}"

Enjoy,
Tony
Which solved my problems back then.
Naturally I assumed I needed a semicolon here...



[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
That is still very doubtful. But, again I won't argue.
 
Hi tsuji,

it's not doubtful, it's Word! [tongue]

Although it looks like a regular expression, it obviously is one of a different flavour...

In Word, placeholder searches with regular expressions really behave that way!

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Upon your word, I take note of it.
 
I can locate this page on the list separator setting.
I am not sure how people think on the localization down to that part of functionality. It sounds like a sure recipe to create trouble and break cross-cultural use of word documents! But, that's not for me to say.
 
It sounds like an idiot programmed that part of Word ... the comma is clearly not being used as a list seperator and certainly shouldn't be affected by regional settings ... but I'd guess it has now become a "won't fix" feature.

Fortunately the proper RegExp library doesn't do anything as foolish.
 
Exactly! Imagine a function in a macro f(x,y) becomes f(x;y), if they push the idea thoroughly and consistently!
 
Micro$oft said:
About using list separators in regular expressions

The previous example uses the following argument to find either one-digit or two-digit dates: {1,2}. In this case, a comma separates the two values. However, remember that your regional settings in Microsoft Windows® control the list separator that you use. If your regional settings specify the use of semicolons as list separators, you must use them instead of commas.
That is exactly what I was referring to.
tsuji said:
It sounds like a sure recipe to create trouble and break cross-cultural use of word documents!
It sure is! It is a pain in the neck if you want to distribute QA macros containing Regex searches in Word!

strongm said:
It sounds like an idiot programmed that part of Word
Not only THAT part... [rednose]
Ever taken a closer look at the RTF code of Word document with fields, bookmarks and comments? Arbitrary line breaks anyone? Formatting code or line breaks in the frigging middle of a field code?

Speaking of programming:
ballmer_peak.png


[tongue]

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top