Imperfect Data

The ability to match names and addresses has evolved as the processing powers of computers have evolved. Faster processors have allowed for more calculations to determine accuracy with multiple variables. The ILS algorithm was designed to produce accurate matches between target & source data even when the person’s name and/or address is entered incorrectly.

The United States Government has dealt with lists containing millions of people throughout its history, but its only been in the last few decades that we rely on computers to organize and retrieve information from these huge databases.

So how did the U.S. Government work with all this information before computers? The Franklin D. Roosevelt Administration’s Works Progress Administration (WPA) was responsible for beginning the Soundex project for the decennial census. Because many states did not have uniform systems for registering births, the Soundex indexes were originally prepared to assist the Census Bureau in finding records for people who needed official proof of age.

The Soundex is a coded surname index (using the first letter of the last name and three digits) based on the way a name sounds rather than the way it’s spelled. Surnames that sound the same but are spelled differently – such as Smith and Smyth – have the same code and are filed together. This system was developed to make it easier to find a particular name even though it may have been spelled (or misspelled, as was more often the case) a variety of ways.

The success of Soundex relies on the assumption that the first letter of the last name is correct. But what if the correct first letter of the last name isn’t there? In the example below, Barak H. Obama is incorrectly spelled as Barak H. Bama in the Target File. The Soundex code for Obama is O150; the Soundex code for Bama = B500. Luckily, the ILS matching algorithm is much, much smarter than Soundex.