Even so, it can often fail, for example 'gh' is sometimes pronounced as an 'f' and sometimes 'gh' is silent. It's important to remember that Soundex was designed only for the English language with its rather quirky way of spelling words and names. The h is not because it's in the W-H-Y group. The c is not because it's in the same group (2) as s. So the code is S432 and is formed from the letters sltz. With all of that all of the examples above should make perfect sense. Only the first 4 letters kept are used and if there are fewer the code is padded to a length of 4 using '0'. Keep the resulting codes to a fixed length. "ph" nicely becomes the equivalent of "f". The letter c conveniently is in group 2 so whether it has a k sound or s (or even z) it is properly handled. The th combination actually ends up in group 3 since the h is dropped. The difference is whether the vocal cords are activated. Sometimes (d, t) the tongue action is the same. ![]() The letters in each group have somewhat similar sounds. The groups consist of the following letters. Assign each remaining consonant (the vowels are gone) to one of 6 groups. So "mm" becomes "m", "ll" becomes "l", etc. Convert any repeated consonates to just a single letter. Unless they start a word, the vowels A-E-I-O-U along with the vowel-like or sometimes silent letters W-H-Y are just ignored. The first idea is knowing what to ignore.The soundex algorithm is based on 4 ideas. $ python soundex.py allen alan alain alun ![]() $ python soundex.py christopher christian Here are some examples of proper names for people. This coded form is always four characters long, begins with the first letter of the word and followed by 3 digits. The program soundex.py will convert words on the command line to their coded form. With proper names it is surprisingly on target. It is simple enough to calculate the soundex code for any word by hand. It was used for census tracking and later for geneolgy. In all fairness soundex was not made for this purpose. In my email program a misspelt word is underlined with a wavy red line and it was only fairly recently that I discovered that a right-click would show alternative possibilities. ![]() Earlier we developed a spell checker and here we'll look at the soundex algorithm to see if it might be useful in offering corrections for misspelled words.
0 Comments
Leave a Reply. |