What is Geocoding?

October 23 2021

One of the most magical things that I've seen mentioned in my research of mapping and routing tools is something called a Geocoder. I've seen a few definitions of what this object is and have seen several different implementations of it, but wanted to take the time to sit down and explain it a la the Feynman Technique.

This post will attempt to teach what a Geocoder is, provide some of the intuition behind how some of them might work, and then will explain why the significance of this important tools can be improved by anyone with a little knowledge of their hometown.

What is a Geocoder?

A geocoder is a tool which analyzes the partial or full name of a place and then returns a list of locations in the real world that are connected to that name in some way.

Without realizing it, we use them everyday. Delivery drivers need to drop off items at addresses that they've never seen before; they typically input the full address name into a navigation app which will then direct them to the street, house and sometimes apartment number in the real world that belongs to that address.

This might seem relatively easy to do. After all, as long as someone keeps a list of the exact address for every person and business in America, this process sounds like it can be done by searching through a very long list of addresses. However, the process is complicated by many things which will be described with the following example.

Example: Empire State

To get a better understanding of how complex a geocoder needs to be consider the following examples: First, let's see what happens when I type "Empire State" into Apple Maps' search bar. Note that this is not a full address, but the geocoder still does something reasonable.

This screenshot is from searching the words empire state on October 17th.

The geocoder does more than just find all places that contain the words "empire state". It orders the results by which it thinks might be most relevant. Interestingly enough, it chooses to put the Empire State Building at the top of the list (probably because it is most often searched).

In another search containing a famous name (Broadway), the Google Maps Geocoder chooses not to prioritize the famous Broadway location in New York City, but instead chooses to prioritize Broadway Street (a street which is close to me in Ann Arbor).

This screenshot is from searching the word "broadway" on October 17th.

So, somehow the Google Maps geocoder is doing something smart to find addresses which are some combination of useful and close to where the query was created. How is this possible?

How does a Geocoder do this?

"How" a geocoder does this is the part of the "secret sauce" that makes each navigation app unique.

Search providers like Google or Apple Maps likely have access to your search history or other demagraphic data about you (for example, your billing address) which can help improve their results. Improvements can be made to the results by not only including data about you to make better estimates, but also by reducing the amount of data they need to search through (Think about searching through the billions of addresses that exist in the world any time you want to find something; there has to be a better and faster way to do the search!). A short list of these features is provided below courtesy of this interesting Stack Overflow discussion:

  1. Instead of using a large lookup table that matches every possible address to a specific GPS coordinate, some geocoders do interpolation to estimate the GPS location. That is, if the geocoder knows the location of house 3800 on your block and the location of house 3700 on the block as well as the curvature of the road, then it can estimate that the address 3750 is halfway in between them.
  2. For many geocoders, you can pass in the address out of sequence and they will typically still work. Somehow, the algorithm makes sense of an arbitrary sequence of words (for example, "liberty 340") and can determine that I actually want 340 East Liberty Street in Ann Arbor. Cool!
The differences between geocoders may also be due to the fact that they are always changing and improving too. As new homes are built, new addresses minted, and postcodes are changed, these systems must be updated to keep up.

Why this matters?

Geocoders are a tool that are used every day and seem to work invisibly. Until they don't.

Whenever one fails it can mean that guests have trouble reaching our AirBnBs, surveys do not adequately cover the country and much worse.

Luckily, the failure of a geocoder can be fixed by one local citizen (like you)! You can submit fixes through projects like OpenStreetMap. With their tools, anyone can notify the world about changes to the map (e.g., "there is a new bike lane in my neighborhood") or explain some new geocoding information (e.g., "our new favorite restaurant was built at this GPS coordinate with this address"). This broad collection of data can be used to build better geocoders as well as help your neighbors. You can be part of the magic that makes it possible to deliver packages to newly built homes, for friendly émigrés to explore their cities, and much more.

References

  1. Video on the Feynman Technique from Thomas Frank
  2. VGIN Composite Geocoding Service Document
  3. How does the Google geocoder work?
  4. How to Think About Postcodes and Geocoding
  5. Airbnb cannot find my address (Forum Post)
  6. Modeling Coverage Error in Address Lists Due to Geocoding Error: The Impact on Survey Operations and Sampling