×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

Working With Spaces In Strings

Working With Spaces In Strings

Working With Spaces In Strings

(OP)
I have just started working with records that have excess spaces within titles and names and I am trying to find the best way to "clean them up."

I cannot trim out all the spaces as I need them between the words.

Examples:
- "A AA Self Storage" should read "AAA Self Storage"
- "A A A Carolina" and "A AA Carolina" should read "AAA Carolina"
- "A A Al-Anon" should read "AA Al-Anon"
- "BB T Bank" should read "BBT Bank"
etc.

Trying to find a better way than to have a long CASE .. ENDCASE statement.

Any suggestions?

Thanks.

"..if you give a man a fish he is hungry again in an hour. If you teach him to catch a fish you do him a good turn."
-- Anne Isabella Thackeray Ritchie.

RE: Working With Spaces In Strings

Hi,

There is no clear pattern between how "A AA Self Storage" and "A A A Carolina" should be trimmed but you might have a look at the STRTRAN() function

? STRTRAN(A AA Self, " ","",1,1) && yields "AAA Self"
? STRTRAN(A A A Carolina, " ","",1,2) && yields "AAA Carolina"

hth

MarK

RE: Working With Spaces In Strings

Not quite sure what your business rule is. Are you saying that if a "word" contains a single letter, it should be concatenated with the previous (or next?) word? (Your last example [BB T Bank] is ambiguous in that respect.)

Difficult to give a final answer without more information, but your solution would look something like this:

CODE -->

lcIn = < your input string >
lcOut = ""
FOR lnI = 1 TO GETWORDCOUNT(lcIn)
  lcWord = GETWORDNUM(lcIn, lnI)
  lcOut = lcOut + lcWord + IIF(LEN(lcWord) = 1, " ", "")
ENDFOR 
< your output string = lcOut 

This will need tweaking, but it should give you the general idea.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

RE: Working With Spaces In Strings

(OP)
I am trying to avoid having a long list of STRTRAN()'s for each entry and try to come up with something more general. That, or have some sort of "spell check" table.

That FOR .. ENDFOR loop looks promising. But a problem would be when a title is like "A New Little Store". I do not want to take out the space between "A" and "New".

Much thanks for the help.

"..if you give a man a fish he is hungry again in an hour. If you teach him to catch a fish you do him a good turn."
-- Anne Isabella Thackeray Ritchie.

RE: Working With Spaces In Strings

Jason,

As I said before, you need to specify your rules. The examples you gave are useful but they don't cover every case. And they are not consistent. For example, you say you don't want a space between "A" and "New", but you do want one between "A" and "AA".

For the moment, forget about the code. Just set out in English what your rules are. That's an essential first step in solving this kind of problem. (If you can't express the rules in English, you are unlikely to be able to program them.)

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

RE: Working With Spaces In Strings

Jason,

You may consider using regular expressions to do the job of pattern matching, and let VFP handle the cleaning up.

This is a start that takes into account your examples and desired processed results.

CODE --> VFP

LOCAL RegExpr AS VBScript.RegExp
LOCAL Matches		&& objects created by RegExpr
LOCAL Match

m.RegExpr = CREATEOBJECT("VBScript.RegExp")
m.RegExpr.Global = .T.
m.RegExpr.Ignorecase = .F.
m.RegExpr.Pattern = "^([A-Z\s]+\s)*"

CREATE CURSOR Strings (InputData Varchar(200))

INSERT INTO Strings VALUES ("A AA Self Storage")
INSERT INTO Strings VALUES ("A A A Carolina")
INSERT INTO Strings VALUES ("A AA Carolina")
INSERT INTO Strings VALUES ("A A Al-Anon")
INSERT INTO Strings VALUES ("BB T Bank")

SCAN

	m.Matches = m.RegExpr.Execute(Strings.InputData)

	IF !ISNULL(m.Matches) AND m.Matches.Count > 0

		&& a match was found, so fetch the first one
		m.Match = m.Matches.item(0)
		&& remove the spaces from the found match and replace it in the source string
		? "[" + Strings.InputData + "]", "->", "[" + STRTRAN(Strings.InputData, m.Match.Value, CHRTRAN(m.Match.Value, " ", "") + " ") + "]"

	ELSE

		? Strings.InputData

	ENDIF

ENDSCAN 

RE: Working With Spaces In Strings

(OP)
Mike,

Yes, working on that.

Atlopes,

Very interesting. Much thanks. Will study that.

Jason...

"..if you give a man a fish he is hungry again in an hour. If you teach him to catch a fish you do him a good turn."
-- Anne Isabella Thackeray Ritchie.

RE: Working With Spaces In Strings

Jason

Just a few more points.

The two main aspects of using a regular expression approach are the pattern definition and what to do with matches found in a given string. The pattern I proposed seems to satisfy your requirements, but that may need some additional tweaking in case there your requirements get more complex than you enunciated.

The STRTRAN in my code should attempt to transform only the first occurrence of the match, so the parameter that would set that is missing from the snippet.

RE: Working With Spaces In Strings

You set wrong values against correct ones and the correct values have the nature to be shorter here,

So to find a list of matches with or without additional spaces you could match names by removing spaces to find matches and to put all of them in the same group, then replace them with the topmost shortest name in the group.

CODE

Create Cursor crsNames (vName v(50))

Insert into crsNames values ("A AA Self Storage")
Insert into crsNames values ("AAA Self Storage")
Insert into crsNames values ("A A A Carolina") 
Insert into crsNames values ("A AA Carolina") 
Insert into crsNames values ("AAA Carolina")
Insert into crsNames values ("A A Al-Anon")
Insert into crsNames values ("AA Al-Anon")
Insert into crsNames values ("BB T Bank")
Insert into crsNames values ("BBT Bank")

Select Padr(Chrtran(vName," ",""),240) as Grouping, Count(*) as GroupCount, Min(Len(Alltrim(vName))) as LengthOfShortestSpelling ;
from crsNames ;
group by 1;
Having Count(*)>1 ;
into Cursor crsGroups

Select Padr(Chrtran(vName," ",""),240) as Grouping, vName as CorrectSpelling ;
From crsNames ;
Inner Join crsGroups on Padr(Chrtran(vName," ",""),240) = crsGroups.Grouping ;
 And Len(Alltrim(vName)) = LengthOfShortestSpelling ;
into Cursor crsCorrectSpellings


Update crsNames set vName = CorrectSpelling From crsNames;
inner join crsCorrectSpellings on Padr(Chrtran(crsNAmes.vName," ",""),240)==Grouping 

Bye, Olaf.

Olaf Doschke Software Engineering
https://www.doschke.name

RE: Working With Spaces In Strings

This isn't tested for performance on a large dataset.

The expression Padr(Chrtran(crsNAmes.vName," ",""),240) appears repeatedly, so┬┤it may be a good idea to add a grouping column to the names list and compute the field "Grouping" just once for all names, then "inherit" it in the further cursors for processing. To not alter the existing table that might also be done by a 1:1 related temp table or cursor.

The other thing to note is that the assumption you do find the correct spelling among the values is not necessarily true. So a manual step before making the final name updates would be to examine the crsCirrectspellings for actually having correct spellings. All names producing the same "Grouping" value be removing all spaces.

One more detail: Why 240? 240 bytes is the maximum length for values to index and that's even just 120 characters in any other collation than MACHINE, so you might shorten that to 120, too. It should be sufficient to group names without adding too many names into one group and thereby merging differing names into one. Another reason to look into the crsCorrectSpellings. Besides, the shorter this can be, the more effective an index will be. I don't index in code here, Rushmore will add temporary indexes if it can. It's actually a good idea to index the Grouping field so Rushmore optimization doesn't need to.

I leave that an open point, as you have to make some decisions anyway and may already have solved it with the other advice you got.

Bye, Olaf.

Olaf Doschke Software Engineering
https://www.doschke.name

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close