FractalWalk
Technical User
In Excel, I am trying to look at a column of text data and find strings that are similar to one another. The idea is to allow a user to choose to overwrite one piece of data with the other to make them match exactly. For example "123 Main St." and "123 main street" need to be standardized to the same thing.
So what I am looking for suggestions in logic. So far I'm comparing string length and setting a maximum length difference. I'm also requiring that the first 2 characters match (not case sensitive). I'm thinking about calculating the number of charcaters within both strings that match and setting a minimum percentage threshold.
Any other ideas? This does not have to be an exact science in that its not critical that every possible match is found. Rather this is just something that prompts a user to make a change when a fuzzy match is found. I'm looking to try and knock out 80% of the work in an automated manner.
So what I am looking for suggestions in logic. So far I'm comparing string length and setting a maximum length difference. I'm also requiring that the first 2 characters match (not case sensitive). I'm thinking about calculating the number of charcaters within both strings that match and setting a minimum percentage threshold.
Any other ideas? This does not have to be an exact science in that its not critical that every possible match is found. Rather this is just something that prompts a user to make a change when a fuzzy match is found. I'm looking to try and knock out 80% of the work in an automated manner.