Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Mulitple Replace and Collect

Status
Not open for further replies.

iggitme

IS-IT--Management
Sep 13, 2006
47
US
Hi All.

Using: $field{'phrase'} which contains a form input, for example "auto dealers new", i am running a well working ispell routine to check spelling on the entry, what I am having a devil of a time figuring out is how to substitute the corrected words for the misspelled words and retain all corrected words...

Example: Entry... aurto dealers nerw (sloppy fingers)
will find auto and new as correct replacements for the misspelled words, but it will also find a custom dictionary collection of other words that might be intended.. i need to respond with

auto dealers new
audio dealers new

etc... where the misspelled words are offered in the fully corrected phrase for each variation..

this is being used: it houses the 'term' which is the incorrect 'word' and the corresponding 'replace' word, it can be quite long with variations...

@corrected = split('\n',$spellingcorrected);
for $cor (@corrected){
($term,$replace)=split(/\|/,$cor);

}
Having assigned $field{'phrase'} to $phrase
This ($phrase =~ s/$term/$replace/) works to make one replacement on the original, but once replaced, a word with a different alternative can't be replaced...

Please... i'm seeking expertise well past my meager mind...

any help will be greatly appreciated

thank you all...
 
I am a bit confused, do you just need to use the "g" modifier in your regexp?

Code:
$phrase =~ s/$term/$replace/[b]g[/b];

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Thank you for the reply! Yes. The data is preformatted, all lowercase, both fields, comma separated. I've been eagerly watching the site. Some might call that a disorder. hmmmm... Sometimes I read this place in depth, just for the fun of it. But alas.. now i have an 'issue'..... thank you for taking a look at it ;-)
 
did that fix your problem then?

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
No, I'm sorry I thought that was commentary. My bad. Actually, since there can be upwards of 20 different alternatives for each misspelled word, a query that has three words (and there is no limit on the amount of words in the query, in fact the more the better for results) can have 20/20/20 variations... i can get the first alternative word to be replaced, and then the first of each other word, but i have not been able to conjure up the scheme to get the possible variations to be listed. Since the first replacement takes the badly spelled word out of the field, it can't be replaced by another alternative. I've tried chaining them, where each previous word is replaced by the next word, but that runs into problems in trying to get the other two words to replace out the same way. In most instances there would only be one misspelled word, and that is pretty easy to handle, but since queries can be rather long, the chances of more than one are greatly enhanced, even by typographical errors.
Example:

the query: [thisd eror wronfg]
returns,
thisd|theist thirst this third thirds thuds Thad Thais thirsty thud thus those theism Th's hist Thia's theed these thud's they'd Thad's|
eror|er or er-or error err Eros Egor|
wronfg|wrong wrung wronger rang rung runoff range rangy NFC RFC|

(this is from ispell, a second dictionary is run after these are found, that trims the list to 'known' words so
the response should then be,

this error wrong
these error wrong

other terms have far more alternatives and far more custom dictionary 'known' words... others, far less...

using one $phrase =~ s/$term/$replace/g; stops the term being 'replaced' since it is already replaced,

there has got to be some simple a concise algorithm that can keep the first word updated, update the second word, then update the third word and do that for however many words there may be, so the returns are suggestions based on known words

but i'm at a loss to find it .... you helped me out greatly (and i am most greatful for it too) a while back with a complicated process... that is working fantastic in this same application.. i was really hoping you would be the one to find this plea...

thanks



 
Don't want to complicate your life, but I think that your way of checking for replacements is not very efficient, I can't even figure out how you can fill in your list, as anything can be a variation of anything...What I would try is to compare words on something like: all words where only one letter is different, all words containing the same letters plus one, all words containing the same letters minus one, etc(but preferably stop here?). However have no prepacked solution for this, just preparing your work for tomorrow...[smile]
Concerning your question, I'm not sure to understand your problem: you simply seem in need of maintaining the original [tt]$phrase[/tt] unchanged and use a copy of it for the replacements that you would then assign perhaps to an array before displaying them.
However if you want to display all the possible variations of alternatives, you must consider that a phrase with, say, 5 errors and an average of 5 alternatives for each misspelled word, would generate a list of 3125 possible corrections! This is perhaps an extreme condition, but you should define a rule for limiting the proposed corrections to a fair number.
The solution to your problem for generating all the possible combinations is via a recursive sub (see faq219-6747 for an example, but yours are not permutations, so it's a slightly different formulation).
Please consider the above, provide some more explanations and more complete examples and come back if you want to go on with a recursive approach.

Franco
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
You don't say how you run the words past iSpell. Maybe you could split each phrase up into individual words and process them one at a time, replacing each 'not found' word with the first option from iSpell as you go? This would limit the number of permutations. After all, when Google says 'did you mean nerdy typographical correction routines', it only gives you one option to choose from...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
If there is a simple solution to this question, I can't think of it. How many words do you expect to be in $phrase most of the time? How many mis-spelled words at anyone time? This gets exponentially longer really quickly, which is what Franco also noted. For even a few words there can be quite a long list of possible solutions. You're going to have to narrow this down somehow at the input side.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top