I’m looking for ideas on analyzing addresses to determine if they are the same so that I can combine groups of individuals with the same address together and assign them a single “household id”. As you can imagine with all of the variations of address data there are several possibilities for each one, such as Drive, Dr, etc which make this interesting.
I was thinking that breaking the address into parts and then comparing them might be the way to go. Such as the street number, street name, city, state, and zip then omit parts of the address such as drive, blvd, rd, etc. Basically I’d be splitting the address string on spaces. There are still problems with this though because of possible formatting differences such as multiple word street names and prefixes like North, N, etc.
If this sounds like a good method what would be a reliable way to accomplish this?
Thanks,
Brad
I was thinking that breaking the address into parts and then comparing them might be the way to go. Such as the street number, street name, city, state, and zip then omit parts of the address such as drive, blvd, rd, etc. Basically I’d be splitting the address string on spaces. There are still problems with this though because of possible formatting differences such as multiple word street names and prefixes like North, N, etc.
If this sounds like a good method what would be a reliable way to accomplish this?
Thanks,
Brad