×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

suggestions for a spellchecker

suggestions for a spellchecker

suggestions for a spellchecker

(OP)
I've created a little spellchecker.
This script works in this way:

After reading each line of the text, realizes some corrections
thanks to a comparison between a dictionary and the text itself.

When it finds a word that doesn't exist in the dictionary, it corrects the words
(giving one or more suggestions) and pushes it into an array.

Here there's my problem:

I would like to give to the user the possibility to choose the correct word
among the words suggested. Something like this:

We found the word "wlak" in your text which isn't correct.
The suggested possibilities are:
1. walk
2. work

type the number associated to the word or 0 if you can't find the correct word.

Then I would like to replace the correct word on the original text (creating a new .txt).

How can I do this?

RE: suggestions for a spellchecker

(OP)
LoL

Without code it's impossible to handle it I know. This is my code.

CODE --> perl

use diagnostics;
use warnings;

my ($file_dictionary, $word, $line, $line1, $alph, $elt, $w, $transposition, $letter1, $letter2);
my (@word, @altered_word, @filedictionary, @filetext, @dictionary, @addition, @replacement, @transposition, @removal);


$file_dictionary = "lexique.txt";
$file_text = "texte.txt";

#I create an array for the dictionary
open (L, "<", $file_dictionary);
while (defined( $line1 = <L>)) {
	chomp($line1);
	@filedictionary = split (/\s/, $line1);
	push (@dictionary, @filedictionary);
	}
	
#I create an array for the text	
open (T, "<", $file_text);
while (defined( $line = <T>)) {
	chomp($line);
	@filetext = split (/(\s|\pP)/, $line);
	for ($i = 0; $i < @filetext; $i++) {
		if (!grep(/^$filetext[$i]$/, @dictionary)) {
		push (@word, $filetext[$i]);
		}
	}
}

#then I create an array for each word 
foreach $w(@word) {
@altered_word = split (//, $w);

#I create an array for the dictionary
open (L, "<", $file_dictionary);
while (defined( $line1 = <L>)) {
	chomp($line1);
	@filedictionary = split (/\s/, $line1);
	push (@dictionary, @filedictionary);
	}

#first operation --> "palrer" will be "parler"
for (my $i=0; $i < $#altered_word ; $i++)
	{
		@transposition = @altered_word;
		$letter1 = $transposition[$i];
		$letter2 = $transposition[$i+1];
		$transposition[$i] = $letter2;
		$transposition[$i+1] = $letter1;
		
		$transposition = join "", @transposition;
		if (grep(/^$transposition$/, @dictionary))
		{
			print "post transposition : $transposition\n";
		}

	}
	
foreach $elt (0 .. $#altered_word) {
#second operation --> parller will be parler

		@removal = @altered_word;
		splice(@removal, $elt, 1);
		$removal = join "", @removal;
		if (grep(/^$removal$/, @dictionary))
		{
			print "post enlevement : $removal\n";
		}

#third operation --> parer will be parler

	foreach $alph('a' .. 'z') {
	
	@addition = @altered_word;
	splice(@addition, $elt, 0, $alph);
	
	$addition = join "", @addition;
		if (grep(/^$addition$/, @dictionary)) {
			print "post addition : $addition\n";
			}

#last operation  : mancer will be manger
		
	@replacement = @altered_word;
	splice(@replacement, $elt, 1, $alph);
	$replacement = join "", @replacement;
		if (grep(/^$replacement$/, @dictionary)) {
			print "post replacement : $replacement\n";
			}
		}
	}
} 


TEXT
French Dictionary

RE: suggestions for a spellchecker

So I see that you essentially compare the words in the text with the words in the dictionary in the following steps:
-the word as is
-the word with every character swapped with its neighbor
-the word with every single character deleted
-the word with one character added at every possible place (but parer changed to parler is a bad example, as parer exists in french!)
-the word with every single character replaced by another one.
A few notes on your code:
-the dictionary is created twice
-you should close, after reading them, the files you open
-grep(/^$filetext[$i]$/, @dictionary is better written (faster) as $filetext[$i] ~~ @dictionary
-this code

CODE -->

while (defined( $line1 = <L>)) {
  chomp($line1);
  @filedictionary = split (/\s/, $line1);
  push (@dictionary, @filedictionary);
} 
can be equivalently written as

CODE -->

while (<L>) {
  push @dictionary, split;
} 
This will be a little faster, but there is an important difference: split without argument splits on multiple spaces, not creating null entries when multiple spaces are encountered (this is likely what you want). In your code this would be equivalent to split (/\s+/, $line1).
In essence I think that what you try to do is a gigantic task (not like going to the moon, but...), unless of course this is a divertissement.
Concerning your question, I think that a possible strategy would be to retain the punctuation (but what about guillemets, apostrophes and ...?) in your text array (possibly using split/\b/, though this will also retain the blanks) and then skipping those during the analysis. At the end your text is rebuilt with a join of the text array.
Good luck

http://www.xcalcs.com : Online engineering calculations
http://www.megamag.it : Magnetic brakes for fun rides
http://www.levitans.com : Air bearing pads

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close