Skip to main content

SOUNDEX

Returns the Soundex code for an alphabetic string.

Synopsis

SOUNDEX(string)

Arguments

string An expression that resolves to an alphabetic string.

Description

The SOUNDEX function is used to group and sort near-equivalents of alphabetic strings, such as variant spellings of a name. The Soundex algorithm takes an alphabetic string of any length, such as a name or an English word or phrase, and returns a four-character equivalence code. This code consists of the first recognized letter of the string (which may not be the first character), followed by three integers between 0 and 6 (inclusive) for the remaining 3 code characters. The three numbers assigned by the Soundex algorithm represent up to three distinct consonant sounds (syllables) that follow the initial letter. Repeating letters (such as “mm” or “mn”) have no effect on assigning a Soundex number.

For example, “Fred” is represented as F630, because F is the first character, 6 is assigned to the letter sound “R”, 3 is assigned to the letter sounds “D” or “T”, and 0 indicates that there are no more consonant sounds in the string. Note that vowels and unvoiced letters (A, E, I, O, U, H, W, Y) are not assigned a number. Ann, Anne, Anna, Ana, and Annie are all represented by A500. Anita, Anida, Annette, and Ann T. are all represented by A530. Anton, Anthony, Anoinette are all represented by A535.

Caché MVBasic uses the Soundex algorithm used by the United States Census Bureau; this is not the same algorithm used by other MultiValue implementations. Therefore, all files using Soundex should be regenerated when moving them to Caché MultiValue. The MVBasic Soundex numeric codes for English consonants are as follows: 1=B,F,P,V; 2=C,G,J,K,Q,S,X,Z; 3=D,T; 4=L; 5=M,N, 6=R.

The Soundex algorithm is not case-sensitive; all Soundex codes return the first recognized letter as an uppercase letter, regardless of its case in the input string. All non-alphabetic characters are ignored, including numbers, punctuation characters, and blank spaces. Soundex does not recognize accented letters or non-Latin letters. For example, “Ü-boat” returns B300, exactly the same as “Boat”. If SOUNDEX cannot recognize at least one letter in string, it returns 0000 (four zeros). If string is the null string, SOUNDEX returns the null string.

Examples

The following examples use the SOUNDEX function to return equivalence codes. Note how the Soundex code is established by the initial letter and the next three significant consonants:

PRINT SOUNDEX("M");           ! Returns M000
PRINT SOUNDEX("MMMM");        ! Returns M000
PRINT SOUNDEX("Mc");          ! Returns M200
PRINT SOUNDEX("Mac");         ! Returns M200
PRINT SOUNDEX("McD");         ! Returns M230
PRINT SOUNDEX("McT");         ! Returns M230
PRINT SOUNDEX("McDuff");      ! Returns M231
PRINT SOUNDEX("McDufflebag"); ! Returns M231

See Also

FeedbackOpens in a new tab