Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Trimming Data From a CSV File.

Status
Not open for further replies.

rl7182

MIS
Jan 17, 2005
3
US
I have a csv file that contains 23,000+ products that I have to Import into our catalog file. This csv file has fields that have excessive space inside the fields. Is there a fairly simple script or toll I can use to clean this data up before I import it ? Need to remove blank spaces.
 
What platform is it on now?

If you can use Excel or Access on the data, use the built-in wizard to export it.

__________________________________________
Try forum1391 for lively discussions
 
The File comes to me in a CSV format. I tried using the wizard, but I see no options to remove blank text within the fileds.
 


Import and THEN remove spaces.

Skip,

[glasses] [red]Be advised:[/red] Researchers have found another Descartes trueism, "Cogito ergo spud."
"I think; therefore, I YAM!
[tongue]
 
You could always try a short script.

Reformat.vbs
Code:
Option Explicit

'Working directory.
Const WorkPath = "Data"

'ADO constants.
Const adOpenForwardOnly = 0
Const adOpenStatic = 3
Const adLockReadOnly = 1
Const adLockOptimistic = 3
Const adCmdText = 1

Dim FSO, NewTS, JetConn, OrigRS, NewRS, F, LastF

Function DeBlank(ByVal TextField)
  Dim Blank

  TextField = Trim(TextField)
  Blank = 1
  Do While Blank <> 0
    Blank = InStr(Blank, TextField, " ")
    If Blank <> 0 Then
      Do While Mid(TextField, Blank + 1, 1) = " "
        TextField = _
            Left(TextField, Blank) _
          & Mid(TextField, Blank + 2)
      Loop
      Blank = Blank + 1
    End If
  Loop
  DeBlank = TextField
End Function

'Create empty new file via FSO.
Set FSO = CreateObject("Scripting.FileSystemObject")
Set NewTS = FSO.CreateTextFile(WorkPath & "\new.csv", True)
NewTS.Close
Set NewTS = Nothing
Set FSO = Nothing

'Connect to Jet.
Set JetConn = CreateObject("ADODB.Connection")
JetConn.Open  "Provider=Microsoft.Jet.OLEDB.4.0;" _
            & "Extended Properties='Text';" _
            & "Data Source='" & WorkPath & "';"

'Open original CSV file.
Set OrigRS = CreateObject("ADODB.Recordset")
OrigRS.Open "SELECT * FROM orig.csv", _
            JetConn, _
            adOpenForwardOnly, _
            adLockReadOnly, _
            adCmdText

'Open new CSV file.
Set NewRS = CreateObject("ADODB.Recordset")
NewRS.Open "SELECT * FROM new.csv", _
           JetConn, _
           adOpenStatic, _
           adLockOptimistic, _
           adCmdText

'Process data.
LastF = OrigRS.Fields.Count - 1
OrigRS.MoveFirst
Do Until OrigRS.EOF
  NewRS.AddNew
  For F = 0 To LastF
    NewRS(F).Value = Deblank(OrigRS(F).Value)
  Next
  NewRS.Update
  OrigRS.MoveNext
Loop

'Clean up.
NewRS.Close
Set NewRS = Nothing
OrigRS.Close
Set OrigRS = Nothing
JetConn.Close
Set JetConn = Nothing
MsgBox "Done"
The FSO is used above because Jet can't create a new text table directly, using something like a SQL DML CREATE statement. But Jet can insert into an existing text file, at least at the end.

To make this work, you have to set up a Schema.ini for Jet to go by. Here I am assuming a simple directory structure:
[tt]
<some directory>
|
+-- Reformat.vbs
|
+--<Data>
|
+-- Schema.ini
|
+-- orig.csv
|
+-- new.csv[/tt]

I just let it all be relative to the current directory of the script. You could always define an explicit path to some other directory in the constant [tt]WorkPath[/tt].

Schema.ini must be in the same directory as the data files for this to work. It describes the type of delimiters, the fields in the files, data types of the fields, etc. In this case we can treat them all as text fields (Char).

Schema.ini
Code:
[orig.csv]
ColNameHeader=False
Format=CSVDelimited
TextDelimiter="
MaxScanRows=1
Col1=ProdName    Char
Col2=Weight      Char
Col3=Source      Char
[new.csv]
ColNameHeader=False
Format=CSVDelimited
TextDelimiter="
MaxScanRows=1
Col1=ProdName    Char
Col2=Weight      Char
Col3=Source      Char
Sample data before and after:

orig.csv
Code:
"  Widget   (small)","0.3  Kg "," Frugmeir  Gzuwerk"
"Wobblekrank  (left) ","1.4 Kg "," Alarius"
" Klapfen  Hanger "," 2.1       Kg","  Curie  LLC "

new.csv
Code:
"Widget (small)","0.3 Kg","Frugmeir Gzuwerk"
"Wobblekrank (left)","1.4 Kg","Alarius"
"Klapfen Hanger","2.1 Kg","Curie LLC"
The quote as text delimiter works well for cases where you might have commas in the data. Perhaps in your case you don't have or want these though. Just use a different Schema.ini.

Schema.ini
Code:
[orig.csv]
ColNameHeader=False
Format=CSVDelimited
TextDelimiter=[COLOR=red]none[/color]
MaxScanRows=1
Col1=ProdName    Char
Col2=Weight      Char
Col3=Source      Char
[new.csv]
ColNameHeader=False
Format=CSVDelimited
TextDelimiter=[COLOR=red]none[/color]
MaxScanRows=1
Col1=ProdName    Char
Col2=Weight      Char
Col3=Source      Char
The results are similar, but without the quotes:

orig.csv
Code:
  Widget   (small),0.3  Kg , Frugmeir  Gzuwerk
Wobblekrank  (left) ,1.4 Kg , Alarius
 Klapfen  Hanger , 2.1       Kg,  Curie  LLC

new.csv
Code:
Widget (small),0.3 Kg,Frugmeir Gzuwerk
Wobblekrank (left),1.4 Kg,Alarius
Klapfen Hanger,2.1 Kg,Curie LLC
You can even use quotes on the original file and no quotes on the output file or vice versa.


This approach may "grind" for awhile on a large file, but it should get the job done. Your data may not be as "scrambled" looking as the samples I threw together. You probably have a fairly regular pattern of excess spaces.

If your needs are more exotic than this you can work on the DeBlank() function yourself to achieve what you need. If all you require is Trim(), LTrim(), or RTrim() just use those instead of writing a DeBlank().
 
I did import, but how do you suggest I THEN remove the spaces ?
 
What application did you import into?

Skip,

[glasses] [red]Be advised:[/red] Researchers have found another Descartes trueism, "Cogito ergo spud."
"I think; therefore, I YAM!
[tongue]
 
>I did import,

How?

>but how do you suggest I THEN remove the spaces ?

And the spaces are still in?!

Did you do a fixed column import, or did you use some delimiter?

__________________________________________
Try forum1391 for lively discussions
 
A simple solution might be:

Open original .csv in any word processor and do a search and replace. Search for:
" "
("<space><space>")​
and replace with
" "
("<space>")​

Do this until you get a "no match found error". At that point you will have no double spaces in your text strings.

To save time, you can use "<space><space><space><space>" etc. for your first few iterations...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top