Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Purge illegal characters. 1

Status
Not open for further replies.

cmn2

Programmer
Mar 6, 2003
73
US
I could use some help please. I wrote this sub that is suppose to replace illegal characters with a space character. The input file is an xml document that was generated by a Mac. It seems to work if there is a single illegal character, but fails to find the second of two when there are two back to back illegal characters. Does anyone have experience on how best to clean a file of illegals?
Thank you for your help.

Private Sub purgeIllegalCharacters(ByVal filepath As String)

'create a streamreader for the document that needs to be cleaned
Dim sr As StreamReader = New StreamReader(filepath)

'start at the first character
sr.BaseStream.Position = 0

'load all of the data into a single string
Dim strData As String = sr.ReadToEnd

'load the string into a stringbuilder
Dim strBldr As New System.Text.StringBuilder(strData)

'number of characters in the stringbuilder (document)
Dim intLength As Int32
intLength = strBldr.Length()

'loop through each character
Dim intIllegals As Int16 = 0
Dim i As Int32
Dim intAscCode As Int16
Dim char1 As Char
Dim char2 As Char


For i = 0 To intLength - 1

'get the ascii code of the current character
intAscCode = Asc(strBldr.Chars(i))

'all nonprintables are ascii code 31 or less
If intAscCode < 32 Then

'keep a running total of the illegal character encountered
intIllegals = intIllegals + 1

char1 = Chr(intAscCode)

'put a space in it's place
char2 = Chr(32)

strBldr.Replace(char1, char2)
End If

Next i

If intIllegals > 0 Then
MsgBox(intIllegals.ToString & " illegal character(s) were found and cleaned.")

Else
MsgBox("No illegal characters were found.")
End If

sr.Close()

'create a new file of the same name, this will overwrite the old one
Dim f As FileInfo = New FileInfo(filepath)

'create a stream writer for the new file
Dim strmWriter As StreamWriter = f.CreateText

'write the stringbuilder with the clean data to the new file
strmWriter.Write(strBldr.ToString)

'close the streamwriter, this will also flush it
strmWriter.Close()


End Sub
 
My apologies, I worded this wrong. Instead of illegal characters, I meant non-printable characters.
 
I used to work with a couple of band printers(only 64 chars availible) and would run a routine that would match all chars in a file against my set of letters on the printer.

only keep what will print. the list of non-printable chars is just as large as the printable ones (at least in my case it was)

ps. be sure you keep the end of line char.

>124


if it is to be it's up to me
 
Hello there,

My regular computer is down so I can't test this but try using a regular expression. Here is some sample code. Let me know if it doesn't work. Good Luck!
Code:
Imports System.Text.RegularExpressions
Code:
' Create illegal character string
Dim badChars As String = String.Empty
For index As Integer = 0 To 31
    ' Use CharW instead of Char to avoid boxing
    badChars &= CharW(index)
Next

' Build RegEx pattern - square brackets say match any one of them
Dim pattern As String = "[" & badChars & "]"

' Are there any illegal characters
If RegEx.IsMatch(strData, pattern) Then
    ' Count them
    intIllegals = RegEx.Matches(strData, pattern).Count
    ' Convert them to spaces
    strData = RegEx.Replace(strData, pattern, " ")
End If






Have a great day!

j2consulting@yahoo.com
 
Sorry, it should be ChrW, NOT CharW.

Here is some test code to show what is happening. It creates 2 strings, the actual character + 2 spaces and a second string which shows the ASCII code value + 1 space. So, for example, A B C D and 65 66 67 68. It then does the cleanse of the data and rebuilds the second string by running through the first string and getting the ASCII code value for every third byte. The various results are shown in a MsgBox.
Code:
Imports System.text.RegularExpressions

Module Module1
   Public Sub Main()
      ' Characters separated by 2 spaces
      Dim text0 As String = String.Empty
      ' ASCII codes separated by 1 space
      Dim text1 As String = String.Empty

      ' Create illegal character string
      Dim badChars As String = String.Empty
      For index As Integer = 0 To 31
         ' Use ChrW instead of Chr to avoid boxing
         badChars &= ChrW(index)
         ' Combine bad1 + A + bad2 + B, etc
         text0 &= ChrW(index) & "  " & ChrW(index + 65) & "  "
         text1 &= index.ToString("00 ") & (index + 65).ToString("00 ")
      Next

      ' Build RegEx pattern - square brackets say match any one of them
      Dim pattern As String = "[" & badChars & "]"

      ' Are there any illegal characters
      If Regex.IsMatch(text0, pattern) Then
         MsgBox("Before: ~" & text1 & "~" & vbNewLine & _
                "Before: ~" & text0 & "~")

         ' Count them
         MsgBox("Bad Chars: " & _
            Regex.Matches(text0, pattern).Count.ToString)

         ' Convert them to spaces
         text0 = Regex.Replace(text0, pattern, " ")

         text1 = String.Empty
         For index As Integer = 0 To 190 Step 3
             text1 &= Asc(text0.Substring(index, 1)).ToString("00 ")
         Next
         MsgBox("After: ~" & text1 & "~" & vbNewLine & _
                "After: ~" & text0 & "~")
      End If
   End Sub
End Module



Have a great day!

j2consulting@yahoo.com
 
Thanks for all your efforts. I'm running tight on time, but will try out your code and will post back.
Thanks again. I really appreciate it.
 
Haven't had a chance to test this one, which is more concise:
Code:
' Are there any illegal characters - Hex 01 thru 31
Dim pattern As String = "[\x01-\x1F]"
If RegEx.IsMatch(strData, pattern) Then
    ' Count them
    intIllegals = RegEx.Matches(strData, pattern).Count
    ' Convert them to spaces
    strData = RegEx.Replace(strData, pattern, " ")
End If

Have a great day!

j2consulting@yahoo.com
 
This is just what I need. Thanks for your help!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top