Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Case sensitivity for special names

Status
Not open for further replies.

MaxGeek

Technical User
Jan 8, 2004
52
US
I've been using a script to make names like "john.doe" become formatted to "John.Doe" so that it can be validated properly with a case sensitive server. I ran into a problem though, a person with the name of "Ronald.McDonald." My script can only make it "Ronald.Mcdonald." Can someone help me make it so that if the first two characts of the last name are "mc" it will capitalize the first letter and the third letter in the last name.

Here is the script I've been using:

$email = "ronald.mcdonald";

#Make email all lowercase
$email = lc($email);
#Make email in to correct format ie John.Doe
$email =~ tr/A-Z/a-z/;
$email =~ s/\b(.)/\u$1/g;


My main problem is how can I make it so that the script will only detect "mc" in the first two characters of the last name not the entire last name?


Thank,
Max Geek
 
I would do something like this...

Code:
$email = "ronald.mcdonald";

#Make email all lowercase
$email = lc($email);
#Make email in to correct format ie John.Doe
($firstname, $lastname) = split(/\./, $email, 2);
$firstname = ucfirst($firstname);
if (substr($lastname, 0, 2)eq "mc")  {
  $lastname = ucfirst(substr($lastname, 0, 2)).ucfirst(substr($lastname, 3));
}
else  {
  $lastname = ucfirst($lastname);
}
$email = $firstname."\.".$lastname;

I'm pretty new to Perl so there may be a more efficient way to do this. I havn't tested it...

----------------------------
SnaveBelac - Adventurer
----------------------------
 
Code:
my $email="ronald.mcdonald";

$email = lc($email);
if ($email=~/\w+\.mc\w+/)
{
  $email=~s/(\w)(\w*\.)mc(\w+)/ucfirst$1$2Mcucfirst$3/;
} else
{
  $email=~s/(\w+\.(\w+)/ucfirst$1\.ucfirst$2/;
}
and again not tested it.
 
I was a bit bored and tried testing the first two solutions, the first one cut the first D in McDonald, the second one works great if you use \u instead of ucfirst. But, as always, here's another way.

Code:
my $email = lc "ronald.mcdonald";
$email =~ /(\w+)\.(mc)?(\w+)/;
$email = "\u$1.\u$2\u$3" || "\u$1.\u$2";
 
Had to do this on a database of over 8 million names once. As well as McDonald, don't forget MacDonald too. And watch out for 'Machine Tools Ltd.' as they will come out as MacHine. Other names like "de Bont", "de'Ath", anything with van, von, da, de, at the beginning are just as bad.

You can never get them all correct using an algorithm, as fixing one problem usually causes others. You just have to settle for the one that gives you the fewest wrong results. A good place to start is to read the whole file and do a cardinality check (i.e. count how many times each name occurs). You can use this to determine how many wrong results you are likely to get.
 
Hmmmm be lazy....

s/Mcd/McD/;
s/Macd/MacD/;


stevexff - as an aside, I had a Dr once called Dr de'Ath - want to take a guess at his nickname?

Mike

"Deliver me from the bane of civilised life; teddy bear envy."

Want to get great answers to your Tek-Tips questions? Have a look at faq219-2884

 
Thank you everyone for your help. I think covering Mc and Mac should be good enough for my situation.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top