Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regular expressions again :(

Status
Not open for further replies.

timgerr

IS-IT--Management
Jan 22, 2004
364
US
Hey all, I have a list of names (in the thousands) that I need to clean up. Some of the names have characters like % * # and others have numbers like 123. I need to strip these out but not things like apostrophes or dashes. Here are some examples

Bad
Code:
gallagher#4454
thomas(acee)
parson[jnogood]
Good
Code:
O'Tool
Mary-kate

I want to so something like $name = s/ / but I am not sure what the regular expression would be.... any ideas?

Thanks,
timgerr

-How important does a person have to be before they are considered assassinated instead of just murdered?
-Need more cow bell!!!

 
Maybe...

Code:
my @orig = (...long list of names...);
my @new = ();

foreach (@orig) {
   if (/[^A-Za-z\'\- ]/) { # if contains anything but these
      next;
   }
   push (@new,$_);
}

-------------
Cuvou.com | My personal homepage
Project Fearless | My web blog
 
I suggest that you break this up into multiple steps. Names are complicated constructs, and trying to fix these in a single regex is not the way to go.

Instead use a list of regex to both fix bad names, but also to verify that a name is correct. Slowly build on these lists until you find that every name that is fixed matches your success conditions.

That is how I would do this type of project anyway.

- Miller
 
I'd use Lingua::EN::NameParse ( which handles most of the problems you are likely to encounter.

Yours,

fish

["]As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life was going to be spent in finding mistakes in my own programs.["]
--Maurice Wilkes, 1949
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top