INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

soundex for arabic language

soundex for arabic language

(OP)
Hi all
I am in need of a soundex algorithm supporting arabic language.All what I found is a php class,but I have no experience with php to translate the class into vfp>Any help will be appreciated>Following is the php class:

<?php
// ----------------------------------------------------------------------
// Copyright (C) 2006 by Khaled Al-Shamaa.
// http://www.al-shamaa.com/
// ----------------------------------------------------------------------
// LICENSE

// This program is open source product; you can redistribute it and/or
// modify it under the terms of the GNU General Public License (GPL)
// as published by the Free Software Foundation; either version 2
// of the License, or (at your option) any later version.

// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.

// To read the license please visit http://www.gnu.org/copyleft/gpl.html
// ----------------------------------------------------------------------
// Class Name: Arabic Soundex
// Filename: ASoundex.class.php
// Original Author(s): Khaled Al-Sham'aa <khaled.alshamaa@gmail.com>
// Purpose: Arabic soundex algorithm takes Arabic word as an input
// and produces a character string which identifies a set words
// that are (roughly) phonetically alike.
// ----------------------------------------------------------------------

class ASoundex {
var $asoundexCode = array('/ا|و|ي|ع|ح|ه/',
'/ب|ف/',
'/خ|ج|ز|س|ص|ظ|ق|ك|غ|ش/',
'/ت|ث|د|ذ|ض|ط|ة/',
'/ل/',
'/م|ن/',
'/ر/'
);

var $aphonixCode = array('/ا|و|ي|ع|ح|ه/',
'/ب/',
'/خ|ج|ص|ظ|ق|ك|غ|ش/',
'/ت|ث|د|ذ|ض|ط|ة/',
'/ل/',
'/م|ن/',
'/ر/',
'/ف/',
'/ز|س/'
);

var $transliteration = array('ا' => 'A',
'ب' => 'B',
'ت' => 'T',
'ث' => 'T',
'ج' => 'J',
'ح' => 'H',
'خ' => 'K',
'د' => 'D',
'ذ' => 'Z',
'ر' => 'R',
'ز' => 'Z',
'س' => 'S',
'ش' => 'S',
'ص' => 'S',
'ض' => 'D',
'ط' => 'T',
'ظ' => 'Z',
'ع' => 'A',
'غ' => 'G',
'ف' => 'F',
'ق' => 'Q',
'ك' => 'K',
'ل' => 'L',
'م' => 'M',
'ن' => 'N',
'ه' => 'H',
'و' => 'W',
'ي' => 'Y'
);
var $len;
var $lang;
var $code;

function ASoundex($len=4, $lang='en', $code='soundex'){
$this->len = $len;
$this->lang = $lang;
$this->code = $code;
}

/**
* @return String : the calculated soundex/phonix numeric code
* @param String : the word that we want to encode it
* [soundex|phonix] : define mapping code to be used in this converting
* @desc mapCode : methode to create soundex/phonix numric code for a given word
* @author Khaled Al-Shamaa
*/
function mapCode($word){
$encodedWord = $word;

if($this->code == 'phonix'){ $map = $this->aphonixCode; }else{ $map = $this->asoundexCode; }

foreach($map as $code=>$condition){
$encodedWord = preg_replace($condition, $code, $encodedWord);
}
$encodedWord = preg_replace('/\D/', '0', $encodedWord);

return $encodedWord;
}

function trimRep($word){
$chars = preg_split('//',$word);

foreach($chars as $char){
if($char != $lastChar){ $cleanWord .= $char; }
$lastChar = $char;
}

return $cleanWord;
}

function soundex($word){
list($dump, $soundex, $rest) = preg_split('//',$word,3);

if($this->lang == 'en'){ $soundex = $this->transliteration[$soundex]; }

$encodedRest = $this->mapCode($rest);
$cleanEncodedRest = $this->trimRep($encodedRest);

$soundex .= $cleanEncodedRest;

$soundex = preg_replace('/0/', '', $soundex);

$totalLen = strlen($soundex);
if($totalLen > $this->len){
$soundex = substr($soundex, 0, $this->len);
}else{
$soundex .= str_repeat('0', $this->len - $totalLen);
}

return $soundex;
}
}

thank you
yahya

RE: soundex for arabic language

It would be asking a lot of members of this forum to translate the above PHP function into VFP for you. It might be better if you try to understand how Soundex works (and it's really not complicated), then try to write your own function in VFP.

That said, have you tried using VFP's built-in SOUNDEX() function with Arabic text? It probably won't work, but it should be the first thing to try.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

RE: soundex for arabic language

(OP)
Hi Mr. Lewis
I used soundex with english names with no problem but with no success in arabic.I tried to build my own program for arabic soundex depending on some readings about the subject but got some results which are not logical in some cases.The above php program is said to be more precise but its logic is not documented to build anew one in vfp. That's why I am asking to translate it to vfp being with no knowledge of php.
thank you

yahya

RE: soundex for arabic language

Yahiadal,
This may be much more difficult than you might expect as well. I note in the code alone that it contains both double byte and single byte values. Unless your VFP application is written using the double-byte character getting VFP to display the value after creation is incredibly complicated. I just tried to even paste one of the arrays into a VFP code window, and I get only "?" = 'S' for example. All arabic characters are lost. So this is very complicated.

Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."hammer

RE: soundex for arabic language

(OP)
YOU ARE RIGHT Mr. Scott.I'll try to change the ?s with corresponding arabic letters in vfp and repost the php code.
Thank you
yahya

RE: soundex for arabic language

Yahya,
It's unlikely that will be successful. The issue is far more complicated, and related to the fact that VFP can not change unicode in the same form (if I recall correctly).

Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."hammer

RE: soundex for arabic language

(OP)
I am very sorry.The attached file contains some letters not well translated.I will revise it and repost.
yahya

RE: soundex for arabic language

Quote:

The attached file contains some letters not well translated.I will revise it and repost.

That's not the point. The point that Scott was making is that VFP does not handle double-byte characters in the same way as in your PHP code. He mentioned the problem of seeing question marks instead of Arabic characters just as an example of the sort of complication he is warning against. Fixing that in the file that you posted won't change the underlying problem.

That said, I think it should be possible to do this, but it's likely to be more difficult than simply converting PHP syntax to VFP.

Have you tried looking for an external control - such as an Activex control or a web service - that can handle Arabic Soundex?

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

RE: soundex for arabic language

This doesn't work out, as Western Windows Versions don't have the arabic codepage 1256 installed, you have to understand such support is quite impossible for non arabic developers.

I can only give you one major idea implemented here, the $transliteration array. In VFP you can do something like these character translations all in just one CHRTRAN:

CODE

lcAccentsText = 'Café'
lcLatinOnly = CHRTRAN(lcAccentsText,'áàéèíìó','aaeeiio'))
? lcLatinOnly && will print Cafe 

If you do this the accents are "removed" and you can do a similar thing - in your case with single arabic alphabet letters or syllables (sorry, I don't know how to name these glyphs), so they will translate to latin letters. I have no idea which character VFP will take for which, when you enter in both right to left and left to right justified texts, but typically CHRTRAN translates the first letter in the second parameter with the first letter in the third parameter, whenever a letter of the second parameter is found in the first parameter. Sounds more complicated than it is, CHRTRAN is like a single call to a series of single letter STRTRAN replacements and second and third param are original/replaced charset.

Once you have latin letters (I assume A-Z will have same ASC() codes in codepage 1256) you might take that as representation or take soundex of that latin letters, though what you get from tranlsation is not really English.

Bye, Olaf.

RE: soundex for arabic language

(OP)
In fact this php code is the only one I could find.I had translated the arabic letters so vfp can see them correctly.If I can translate the logic of php to vfp then may be it will work.
yahya

RE: soundex for arabic language

(OP)
Thank you Mr. Olaf.I understand the complexity of the topic.I will try to study php syntax so i can understand the php program logic

yahya

RE: soundex for arabic language

>I had translated the arabic letters so vfp can see them correctly.

Well, this is surely what you see, but as said non arabic windows versions don't have the ANSI character sets vf will need to display that as arabic letters. We have unicode and UTF-8 and can see the arabic letters here in html, but that does not translate into VFPs anis charsets. Only on your most probably arabic Windows version. If you modify the PRG the command window surely will echo "...php_soundex.prg as 1256", but in european or american Windows this just causes the error "Codepage number is invalid". When we edit as 1252 (for example) this will not look arabic. So no real way to help you. Installing such codepage is not a single thing you can do, you would need to install the whole arabic language, but when switching to it, I couldn't operate in Windows anymore.

Bye, Olaf.

RE: soundex for arabic language

(OP)
you are absolutely right Mr. Olaf.
I found a program written in c# with same language encoding problem but I was able to translate it to vfp by rewriting the arabic letters in the code in place of the ?s that appeared.The program worked fine but as the his author said is a betta and need much refinement.
AS i siad if I can understand the php program logic then the translation could be done by manually rewrite the arbic letters in it.
thank you
yahya

RE: soundex for arabic language

Quote:

AS i siad if I can understand the php program logic then the translation could be done by manually rewrite the arbic letters in it.

I'm not convinced that's the correct approach, but if you want to try it, it won't be too difficult for you to learn enough PHP to do the translation yourself. The syntax is not so very different from many other languages, including VFP.

Keep in mind these points:

1. Variable names start with $.

2. = is used to assign a value to a variable, while == is used in conditions to test for equality.

3. .= (dot equals) is like += in other langauges. So a.=1 is the same as a=a+1.

4. { and } are ued to delimit blocks of code in control structures such as if and for.

5. A single dot is used to concatenate strings.

6. Conditions (following if, for, etc.) are enclosed in parentheses.

7. All statementents are terminated by a semi-colon.

Obviously there's a lot more to it than that, but the above should give you a start in translating from PHP to VFP. There are plenty of references and tutorials available on line if you get stuck.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

RE: soundex for arabic language

(OP)
Thank you Mr. Lewis. It's not pad to learn php and I will do it and let you know of any progress in this subject.

yahya

RE: soundex for arabic language

And don't forget that there is a PHP forum here on Tek Tips where you can get answers to specific questions.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

RE: soundex for arabic language

(OP)
That's good Mr. Mike.Thank you
yahya

RE: soundex for arabic language

(OP)
Hi
This version of soundex for arabic names is inspired from arsoundex php version
It translate the Arabic string into phonetically equivalent English string using a mapping table
It then apply vfp soundex function to the result string to get the result
Any remarks or improvements are appreciated
To use this program you must be able to run vfp in windows with Arabic support

Yahya

RE: soundex for arabic language

Thanks for that feedback. I think we have some other arabic members, to whom that'll be helpful. You might consider posting this as a FAQ. Just click on FAQ (you find it in the head section), In the FAQs page scroll all the way down to the "Write A FAQ" form and post a new FAQ. Maybe in the category "String Commands" or "Useful Functions & Procedures".

Put up the code in code tags this way: [code]your code here[/code] - not as attachment. I think attachments are only kept a few months and then are removed. Besides, normal threads will get closed and can get no comments after that, could only be referenced by their thread id. FAQs allow sending comments to you, which may also end up as a business proposal for making an integration of that into a software, for example.

Posting some ZIP might also be helpful, then think about using cloud drives. That'll even let you keep control and enable you to update code. The same goes for FAQ text though, you can edit your FAQ after posting, and posted code is easier to be trusted, ZIPs might contain any malware/spyware/ransomware, not onyl because you would put that in, but any cloud drive hacking might add itself to public downloads.

Most important perhaps - FAQs stay in sight, threads go down over time. The forum search of course helps finding both old posts and FAQs, but this qualifies for a FAQ even though it was seldom asked,I always just think about it being something not often found and good to find, if you had the same problem and not being too obscure and special to qualify as FAQ, and this does.

Bye, Olaf.

RE: soundex for arabic language

Let me add my thanks. It's always good when someone here solves a problem and shares the solution with others.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

RE: soundex for arabic language

(OP)
Thanks for you Mr. Olaf and mr.Mike and for everyone assisted or inspired the proposed solution.

Yahya

RE: soundex for arabic language

(OP)
Done adding this article to FAQ section.

Yahya

RE: soundex for arabic language

If you look into your mail, you'll see a recommendation to announce the FAQ in a new thread. You can do so by posting faq 184-7907 (without a space between faq and the faq id), which automatically expands to a link to FAQ184-7907: Arabic Soundex program.

You chose a onedrive download. That's fine, though I said it might not get trusted as much as simply posted code, it is easier to download and can be checked for viruses etc. Also since you posted a PRG and not ZIP, it can be inspected before execution with the VFP editor. Good choice and compromise of both advantages.

By the way: ZIPs also are not executed, but can exploit buffer overflows of well known zip software like winzip, winrar or 7zip to run malware even just upon inspection of a zip archive. This is not just theory, see a case of last year at http://securityaffairs.co/wordpress/47242/hacking/...

A good idea is to offer a SHA1 or MD5 checksum of the file, so any future downloader can first check no hacker made changes to your original upload by calculating the same checksum of the downloaded file before opening it any other way. A double effort to change both the onedrive file and the posted checksum in the tek-tips FAQ would be needed to make a changed file appear as original upload, then.

Bye, Olaf.

RE: soundex for arabic language

(OP)
Hi Mr. Olaf
If I post the code directly,the arabic letters will not show correctly.That's why I choosed to upload the prg file as it is so the arabic will show correctly in windows with arabic support.
Thank you
Yahya

RE: soundex for arabic language

Ah, yes, I forgetthat you'd have a codepage transatin through posting. I would guess if a user has Arabic Windows copy & paste would still work, your posting here also has the arabic letters, so that part of the copy&paste into the forum works.

Anyway, it always is easier to have a separate file than to copy out code of a post. The only thing not showing up right away is getting an overview about how the code works.

Bye, Olaf.

RE: soundex for arabic language

(OP)
Hi mr. Olaf
The supplied program when run give a demo results of applying the function in the program on sample names.
The program generate a table where every arabic letter has it's phoentic engliah equivalent.When we call the function with arabic name,the name is translated to it's English phoenitic equivalent on which we call the standard vfp soundex function and get the soundex code.
By the way I use Oracle virtual box to install different versions of windows with different language support for testing purposes.
Yahya

RE: soundex for arabic language

Virtual Box is a good thing, I could do the same, as I still have lots of license keys due to having been MVP for a few years. Anyway I couldn't operate on any other Windows versions than German and English Windows. And those languages English and German can easily be combined in one Windows.

Bye, Olaf.

RE: soundex for arabic language

(OP)
I am too operate on english windows (win 10)with arabic language support installed:
-in control panel,language,add a language,choose arabic labanon
now for vfp being not unicode aware ,you have to :
-in control panel,region,administrative,language for non-unicode prgrams,change system locale,arabic(lebanon)
after that vfp will be able to see and use arabic letters

By the way my supplied program can be adapted to other languages by changing the character mapping defined in the file ar2en.dbf
yahya

RE: soundex for arabic language

The only thing hindering is after setting locale to arabic, I would have a hard time setting it back to german or english.

By the way characters mappings are also done for collations, also see SYS(15), which works quite like CHRTRAN, from the description. I never used it, though, it also is recommended to instead use COLLATIONS.

And while at SYS functions I see SYS(2300) could enable setting arabic codepage without changing the whole system locale.

CODE

SYS(2300,1256,1)
MODIFY COMMAND ...php_soundex.prg as 1256 
I tried and unfortunately t still gives the "Code page number is invalid" error.

Anyway, in regard of anything related to internationalization always see Steven Black. http://stevenblack.com/intlasia/ Most steps mentioned here for the most problematic asian languages also applies to other foreign language settings.

Bye, Olaf.

RE: soundex for arabic language

(OP)
Hi Mr. Olaf

SYS(2300,1256,1)
MODIFY COMMAND ...php_soundex.prg as 1256
worked fine on my system configured as mentioned previously.
I'll see Steven Black pages.
Thank you
yahya

RE: soundex for arabic language

Well system doesn't need that sys call anyway. It already supports codepage 1256 out of the box amd may even be your default codepage in VFP.
You don't need any of these hints, they are for people using other than arabic Windows.

Bye, Olaf.

RE: soundex for arabic language

(OP)
I am not using arabic windows.I use english windows same as you but with arabic language added and then system locale for not unicode programs set to arabic as I mentioned in a previous post.
You can try it in a virtual machine if you want.
For testing I removed the arabic for non unicode programs and set it back to english,rebooted,and voila,page 1256 is no more available.return back to arabic for non unicode programs,restart,and page 1256 come back.
thank you
yahya

RE: soundex for arabic language

I am using german Windows, in principle it should work, but I'mnot yet eager to find out. I trust your code works and the hints are intended for those needing arabic support, I think that they will have enough information by now.

Thank you.

Bye, Olaf.

RE: soundex for arabic language

(OP)
Thank you very much
yahya

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close