Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Rhinorhino on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Wishing for reg. exp. in VBA... 3

Status
Not open for further replies.

OsakaWebbie

Programmer
Joined
Feb 11, 2003
Messages
628
Location
JP
I need to do a little task that would be simple with "regular expressions" (those of you with a C or Perl background would know what I mean), but as far as I can tell, that functionality is not in VBA as such. If I'm wrong, please tell me, and that will solve my challenge.

The situation: I am processing Powerpoint slide titles for use elsewhere, and some of them end with "(1 of 3)" or something like that, and in that case I want to strip that off. But there might be something else in parentheses before it, so a simple InStr(1,str,"(") is risky. The format of the unwanted stuff will always be "(# of #)", making me yearn for R.E., but with my limited knowledge of VBA, the only way I can think of to do this is to use InStr to look for "(", then "of", then ")", and if found, somehow (I'm not sure how yet) check to see if the stuff in between is numeric. Is there an easier way?
 
Something like this ?
Public Function ridPages(ByVal myStr As String) As String
Dim newStr As String, s As Integer, e As Integer, tmpArr
newStr = myStr
e = InStrRev(myStr, ")")
If e > 0 Then
s = InStrRev(myStr, "(", e)
If s > 0 Then
tmpArr = Split(Mid(myStr, s, e - s + 1))
If UBound(tmpArr) = 2 And tmpArr(1) = "of" Then
newStr = Left(myStr, s - 1) & Mid(myStr, e + 1)
End If
End If
End If
ridPages = newStr
End Function

If you insist on regular expression, reference the Microsoft Scripting Runtime and play with the VBScript.RegExp object.

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
Wow, that was some quick coding! You must work with VBA a lot. You also used several handy functions I didn't know about - thanks for the code and the education. My version would have taken much longer and been more clumsy.

Changing variable names to protect the guilty, your code slid right into my routine. The only catch was that VBA is apparently not smart enough to stop executing tests in an If statement if the the first test fails and is followed by And - when I tested with a single word in parentheses I got a runtime error because tmpArr(1) didn't exist. But nesting the If's solved that, and I'm up and running. A star for you!
 
Hi, I'm back. I tried beefing up one If statement using IsNumeric() to check that what should be numbers are indeed numbers, but it stopped working (resolves as false even when the pattern is matched). The only change I made was that one If statement. I am probably using the function incorrectly, but I don't see how (I have the Japanese version of Office [karate], so the help files are not the easiest for me to read [hairpull2]). Can you spot my dumb mistake? (that's what I'm assuming it is...)

Here is the code snippet in question (the real thing doesn't line-wrap):

Code:
      If UBound(tmpArr) = 2 Then
        If IsNumeric(tmpArr(0)) And tmpArr(1) = "of" And IsNumeric(tmpArr(2)) Then
          ThisTitle = RTrim(Left(ThisTitle, s - 1))
        End If
      End If
 
OsakaWebbie,
Of course you can use regular expressions in VBA. Here is a function that works in Excel VBA which will remove (xx of yy) portions of a string, with arbitrary numbers of spaces permitted before and after the two numbers
Code:
Function SlideRemover(str As String) As String
Dim RgExp As Object
Set RgExp = CreateObject("VBScript.RegExp")
RgExp.Pattern = "\(\s*\d+\s*of\s*\d+\s*\)"
SlideRemover = RgExp.Replace(str, "")
End Function
Brad
 
OsakaWebbie,
PHV's code was grabbing the left parenthesis and right parenthesis, hence your failure to pass the IsNumeric test. Try this revision to that code:
Code:
Public Function ridPages(ByVal myStr As String) As String
Dim newStr As String, s As Integer, e As Integer, tmpArr
newStr = myStr
e = InStrRev(myStr, ")")
If e > 0 Then
   s = InStrRev(myStr, "(", e)
    If s > 0 Then
      tmpArr = Split(Mid(myStr, s + 1, e - s - 1))
      'MsgBox tmpArr(0) & " " & tmpArr(2)
      If IsNumeric(tmpArr(0)) And IsNumeric(tmpArr(2)) And _
        UBound(tmpArr) = 2 And tmpArr(1) = "of" Then
        newStr = Left(myStr, s - 1) & Mid(myStr, e + 1)
      End If
    End If
End If
ridPages = newStr
End Function
Brad
 
Wow, VBA does have hope for real programmers! [wink] (Please, nobody take that wrong - VBA has plenty of power in its own way - but my quip may betray the fact that I am from the old school of Unix programmers!) I tried searching the web for info about R.E. and VBA, and the only hits I got were talking about plugins for sale to do it, so I assumed the functionality wasn't there, but maybe those plugins were for old versions and R.E. was added later.

R.E. is a much cleaner way to do what I was trying to do (and will be handy in the future also). You also wrote with an update to PHV's code, but although you did spot the reason why IsNumeric wasn't working, I find it interesting that instead of just suggesting the necessary change to the parameters of Mid(), you rewrote the If structure to have the same problem as the original code - if the stuff in parentheses has less than 3 words you will get a run-time error due to the array index being out of bounds. But anyway, with the knowledge that R.E. exists in VBA, I have no need for longer ways of doing this kind of thing. The star is for your first post - you made my day with that news!
 
OsakaWebbie,
I believe that VBScript got added to the mix as a result of the browser wars with Netscape. RegExp.dll has to be installed on your machine. Typically this gets installed as part of VBScript 5.x in an Internet Explorer 5.x or greater installation. Regardless of how it got there, I've used regular expressions in Excel 97 through 2003.

When I was looking at PHV's code, I recognized that it was looking for (xx of yy) and in fact would fail if there was a space before xx or after yy. Rather than deal with this issue, I just assumed that your data was under control--and that RegExp was the better approach anyhow.

I'd be interested if you have improvements to suggest to the RegExp pattern. Regular expressions are relatively new to me, and an old Unix pro like you could probably teach me lots.
Brad
 
Your RegExp pattern was fine. I added a little bit to my version, but not because yours was lacking, rather because I could do more with RegExp to fine-tune my check. I included a check for white space at the beginning so that I wouldn't need to call RTrim() as I was doing previously, and at the end I added code to make sure that the parenthetical phrase was at the end of the string (excepting white space). So my final pattern was "\s*\(\s*\d+\s*of\s*\d+\s*\)\s*$". The dollar sign forces the match to be true only if at the end of the string (^ is used to force a match at the beginning of a string). One really can't make the pattern any more concise than it is - you did fine.

It's true that my data is somewhat under control, but other people will be adding slides to the master file that this macro will be used with, and some people are not as particular about white space as I am. I can't guarantee that there won't be extraneous spaces here and there, or even a missing space between the "of" and one of the numbers, so your idea of allowing zero or more spaces between stuff was exactly the right idea. In PHV's code, I hadn't noticed that the parentheses ended up included in the string being parsed into "words" (a subtlety you picked right up on), so I was assuming that any spaces surrounding the numbers would be removed by the Split() function. But frankly, I don't know how Split() works in VBA, so perhaps the spaces would stay anyway, and I don't know if IsNumeric(" 32 ") would pass or not - both functions were new to me, as I have only just begun to use VBA. And the possibility of a lack of space surrounding the "of" hadn't even occurred to me, but is a definite possibility as many people start to add slides. RegExp is more my style - I know exactly what will match and what won't, and it's all in one statement rather than a mess of nested If's.

As for fancy RegExp patterns, this current situation doesn't end up requiring a particularly cryptic one - it is comparatively "readable". My favorite example of a cryptic but useful pattern is one that was used as a teaching point in a book I have on PHP: "^.+@.+\\..+$", which is a somewhat wimpy attempt to check to see of the string is an email address (wimpy in that there are various strings that would pass the test but not be valid email addresses). A period matches any character except a newline, and "\\." matches a literal period (making a requirement of at least one period in the domain name). I can only assume that those characters work the same way in VBA's implementation of RegExp, as basic RegExp symbology is supposed to be language-independent. Good luck learning more - it may look cryptic [spineyes], but it's very powerful! [machinegun]
 
OsakaWebbie,
Thanks for the discussion on RegExp.

I like the "readability" of using \\. to indicate a literal period. I had been using \x2E for that purpose, which is too confusing. Unfortunately, when I tested your PHP e-mail pattern in VBA, it didn't work. I did find that [.] could be used, however.
Brad
 
Well, you know, just 'cause they say something is platform-independent or language-independent... To be honest, though, I have never actually tested that particular pattern - it was just in a book - but I assume it works in PHP (and probably Perl, grep, etc.)

My impression has been that Microsoft generally seems resistent to doing anything exactly the same way as Unix/Linux (or perhaps, the same way as anyone else at all) - they like to put their own twist on it. Here in Japan that is particularly apparent - although Unix and Mac each default to a certain character set for describing Japanese characters, they happily support all common ones and the settings can easily be changed, while in Windows you can only use one particular set, one that is not currently the favorite of anyone else. If, for example, I generate data for a web-based database on my Windows machine, I have to run it through a conversion routine (which is written in Perl, so must be run on the server) to convert the characters to the more universal character set that my database uses. I get the impression that Microsoft doesn't want to let the user have too much control over their own computer because they might mess it up or something. Okay, enough grumping... I suppose if I was that unhappy, I wouldn't be on this forum, since it is all about a Microsoft product! [wink]
 
OsakaWebbie,
One more thing I forgot to mention regarding the Split function--if you have more than one space in between the number and "of", Split creates parses it into extra words. Yet another source of error.

I should have used Application.Trim on the original string to eliminate all extra spaces. VBA Trim just removes spaces at the beginning and end of a string, but Application.Trim gets those plus the ones in the middle.
Brad
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top