Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

how do I count the number of times a word or string appears in a file? 1

Status
Not open for further replies.

PPettit

IS-IT--Management
Sep 13, 2003
511
US
Our accounting system generates an XML file for each invoice that's created. These XML files have to be submitted to a company so that they can automatically input the data into their SAP system. This is where the problem occurs. The SAP system can accept up to 98 line items but our invoices often contain more than that. When we send a large invoice the SAP system rejects it and I have to split/edit the invoice so that the number of line items in each new file is less than 98. Submitting these files is a pain so I'd like to keep from having to send them more than once. Unfortunately for me, the bigshot coders on both ends don't want to bother with altering their respective programs just to make my life (and the lives of other submitters across the country) easier.

These are the steps I'm trying to follow:
Search all of the XML files (ex: 231208.xml, 231209.xml, etc.) in the invoice directory (L:\XML) and any subdirectories (the files are put into folders which relate to particular areas) then report how many times the tag <DixieElectric:LineItems> appears in each file. From this report I can determine if I need to edit any of the files before they ever get uploaded.

I'd appreciate it if someone could give me a clue as to the proper use of any needed filesytem objects and/or the best way to code a regular expression to accomplish what I want. I'm a total newb when it comes to VBScript.

Thanks in advance for any help you can give.
 
If not on a Win9x machine you can try something like this:
Code:
txt=Chr(34) & &quot;<DixieElectric:LineItems>&quot; & Chr(34)
cmd=&quot;FOR /R L:\XML %f IN (*.xml) DO FIND /C/I &quot; _
 & txt & &quot; &quot; & Chr(34) & &quot;%f&quot; & Chr(34)
Set oSh=CreateObject(&quot;WScript.Shell&quot;)
Set oEx=oSh.Exec(&quot;%COMSPEC% /C &quot; & cmd)
Do While oEx.Status=0
  WScript.Sleep 100
Loop
While Not oEx.StdOut.AtEndOfStream
 buf=Replace(oEx.StdOut.ReadLine,vbCrLf,&quot;&quot;)
 i=InStrRev(buf,&quot;.XML:&quot;,-1,1)
 If i>0 Then
   n=Mid(buf,i+5)
   If n>97 Then Wscript.Echo buf
 End If
Wend

Hope This Help
PH.
 
This will count how many occurances are found in a string:

Dim x, n, cText, cFind
' cText string variable is only a temp work copy
cText=&quot;source text string with external testing exters...&quot;
cFind=&quot;ext&quot;
n=0
x=1
While len(cText)>0
x=instr(cText,cFind) ' case must match
'x=instr(lcase(cText),lcase(cFind)) ' ignores case
If x = 0 Then
cText=&quot;&quot;
Else
n=n+1
cText=mid(cText,x+1)
End If
Wend

MsgBox &quot;Matches: &quot; & cStr(n), 64, &quot;Total matches&quot;

Additional reading:
thread329-724287
thread329-721298
 
Not that it is any of my business but if that was my situation I would explain to My Boss that My team was spending a lot of time each month fooling around with invoices that had line items exceeding 98. He could then explain to the Coders Boss that this small amount of time
to change the line item counter in the invoice program, which is most likely a few lines of code, would greatly
enhance the productivity of his workers. In a real world Situation this works or people's heads fly!!!.
Enjoy Bob`
 
Thanks for the examples, guys. I had trouble getting either of them to work correctly and I'm not sure why at this time. However, I did make a breakthrough on my own and figured out how to search a single directory and do a count for each file. Unfortunately, I didn't remember to bring a copy to work today. I'm having some trouble getting it to output a usable list and could use some help on it. Currently, I've got it generating a message box that displays the count and the filename but I want it to generate a text file instead. The way it displays the info on each line is messed up as well. I haven't had the chance to fix either of those, yet.

I'll try to post what I have this weekend.


ShadowFox333:
It's a complicated situation but I'll try to keep it short.

My end: Mostly inaccessible consultant that I can't convince the company to get rid of. User-unfriendly accounting package using a relatively obscure database and front end.

Other end: One of the largest oil companies in the world. Like most huge companies, it's hard to get them to do much of anything useful in a short amount of time.
 
In case anyone is interested, this is what I have so far. I still need to work out a way to send the results to a text file, make it search a folder and it's subfolders, and add input boxes in order to specify individual files/folders.


Option Explicit

Const ForReading = 1, ForWriting = 2, ForAppending = 8
Const LINETEXT = &quot;<DixieElectric:LineItems>&quot;
Dim buf, colMatches, f,fold, fso, lineCount, Match, re, strg, ts, xmldir

Set fso = CreateObject(&quot;Scripting.FileSystemObject&quot;)
Set xmldir = fso.GetFolder(&quot;L:\XML\5200\&quot;)

For Each f In xmldir.Files
Set ts = fso_OpenTextFile(f, ForReading)
strg = ts.ReadAll
ts.close

Set re = New RegExp
re.Global = True
re.Pattern = LINETEXT
Set colMatches = re.Execute(strg)
lineCount = 0

For Each Match In colMatches
lineCount = lineCount + 1
Next

MsgBox &quot;Found &quot; & lineCount & &quot; matches in &quot; & f.name & vbCrLf
Next
 
The Match collection has a count property, so you could change:
[tt]
Set colMatches = re.Execute(strg)
lineCount = 0

For Each Match In colMatches
lineCount = lineCount + 1
Next

MsgBox &quot;Found &quot; & lineCount & &quot; matches in &quot; & f.name & vbCrLf
Next
[/tt]
to
[tt]
MsgBox &quot;Found &quot; & re.Execute(strg).Count & &quot; matches in &quot; & f.name & vbCrLf


 
For anyone who might still be interested, this is what I eneded up with. It took about 10 minutes to go through 156MB worth of files (13,644 files / 587 directories). It needs a few more cosmetic and functionality tweaks but overall I'm pleased with it's performance. My thanks go out to everyone who helped, even if it was just from posting the code for your own project. For me, examples are the best way to make sense out of most things, especially when it comes to programming.

The code in it's current form:
[tt]'require all variables to be defined before use
Option Explicit

Const ForReading = 1, ForWriting = 2, ForAppending = 8
Const conLineText = &quot;<DixieElectric:LineItems>&quot;
Dim colFile, colSubDir, colFiles, fso, rptEntry, rptFile
Dim strFileName, re, xmlDir, xmlFile, xmlFolder, xmlLines


'build file/directory list and get the report file ready for editing
Set fso = CreateObject(&quot;Scripting.FileSystemObject&quot;)
Set xmlDir = fso.GetFolder(&quot;L:\&quot;)
Set colSubDir = xmlDir.SubFolders
Set rptFile = fso_OpenTextFile(&quot;L:\Report.txt&quot;, ForWriting, True)


'build a list of files by cycling through one subdirectory after the next and
'write just the subdirectory name at the beginning of each group of
'entries on the report
For Each xmlFolder In colSubDir
Set colFiles = xmlFolder.Files
rptFile.WriteLine(xmlFolder.Name)


'search for the desired string in one file after the next
For Each xmlFile In colFiles
Set colFile = fso_OpenTextFile(xmlFile, ForReading)
strFileName = colFile.ReadAll
Set re = New RegExp
re.Global = True
re.Pattern = conLineText


'only report if the string appears more than 80 times within the same file
'write results to the report for each file matching the criteria
If re.Execute(strFileName).Count > 80 Then
rptFile.Write(re.Execute(strFileName).Count & &quot; line items in &quot; & xmlFile.name)
rptFile.WriteLine(vbTab & &quot;Created: &quot; & xmlFile.DateCreated)
Else
End If
Next

'find the next file in the same directory

'add a blank line between each group of entries
rptFile.WriteLine
Next

'find the next folder in the parent directory

'close the report file and notify when finished
rptFile.Close
MsgBox &quot;Search Completed&quot;


This is what the output looks like:
5738

574

5742
93 line items in 218436.xml Created: 02/10/2003 4:34:33 PM

5746
82 line items in 230528.xml Created: 12/01/2003 11:11:29 AM
134 line items in 230630.xml Created: 12/02/2003 4:38:13 PM
136 line items in 231090.xml Created: 12/12/2003 4:17:37 PM

5747
118 line items in 229335.xml Created: 10/29/2003 11:25:08 AM
83 line items in 229666.xml Created: 11/06/2003 9:08:22 AM
81 line items in 231928.xml Created: 01/07/2004 3:43:01 PM

5754

5770[/tt]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top