GeoCoding Issues with OpenStreetMap/Nominatim

11,384

I think what you're looking for is address verification. Google, Nominatim, and others, only perform address approximation which is good for finding addresses when you aren't sure what they are, but the results are only a best guess.

I helped develop an API which verifies and geocodes addresses according to stringent CASS™ requirements called LiveAddress. I ran your sample address through Google, Nominatim, and LiveAddress API and these are the results:

  • Google found the address despite the typo in "Livingston" but could not guarantee its validity, saying, "Address is approximate." -- then again, it says that for just about every address you try.

  • Nominatim does not find it because of the typo. Perhaps a drawback to using Nominatim is that it doesn't try to compensate for typos, verify the accuracy or completeness of addresses, etc. Fixing the typo returned some information but it was anyone's guess what had to be fixed, and why the query failed anyway.

  • LiveAddress doesn't recognize the address as entered because of the typo. Missing the "s" in "Livingston" is dramatic because there are streets named "Livington," leaving the query ambiguous, and the results were too much of a mis-match to return according to CASS™ specs. Changing the name with a different typo, "Livingstn," however, produced a valid result, which typo Nominatim did't accept either:

... for some reason I have to break out of my bullet points for code to render properly:

[
    {
        "input_index": 0,
        "candidate_index": 0,
        "delivery_line_1": "182 Livingston Ave",
        "last_line": "Albany NY 12210-2512",
        "delivery_point_barcode": "122102512824",
        "components": {
            "primary_number": "182",
            "street_name": "Livingston",
            "street_suffix": "Ave",
            "city_name": "Albany",
            "state_abbreviation": "NY",
            "zipcode": "12210",
            "plus4_code": "2512",
            "delivery_point": "82",
            "delivery_point_check_digit": "4"
        },
        "metadata": {
            "record_type": "S",
            "county_fips": "36001",
            "county_name": "Albany",
            "carrier_route": "C011",
            "congressional_district": "21",
            "rdi": "Residential",
            "latitude": 42.66033,
            "longitude": -73.75285,
            "precision": "Zip9"
        },
        "analysis": {
            "dpv_match_code": "Y",
            "dpv_footnotes": "AABB",
            "dpv_cmra": "N",
            "dpv_vacant": "N",
            "active": "Y",
            "ews_match": false,
            "footnotes": "M#"
        }
    }
]

The analysis footnote "M#" indicates a match was achieved by fixing the spelling of the street name. The resulting DPV footnotes "AABB" indicate that the entire address matched a street + city/state on the national ZIP+4 file. Also note that Zip9 precision which is the most precise level of geocoding (currently) — accurate to block (or closer) level.

So, in answer to your questions:

  1. That depends. Are your customers entering an address on a website form? Tell them right away before they continue, that the address isn't valid. We're working on a jQuery plugin to make this cut-and-paste easy for everybody, but until then, you can see our concept in our checkout form which implements a pretty slick system: SmartyStreets has a jQuery Plugin which verifies addresses on website forms (just copy-and-paste). When an address is typed, it is automatically verified. If it is wrong, they slide up a notification asking the user if they'd like to fix it. Sometimes their address is ambiguous, where it returns a few valid results. (Try: "100, new york, ny") — They show a few suggestions and you can pick one. You fix it and the form does not submit until the user gets a valid address or says "Use mine anyway; I guarantee it's right." Or, if the address is correct, they put the standardized results in the address fields and display a green notice: "Address verified!"

  2. I think I discussed this above. Your query is fine; it seems to be a shortcoming in Nominatim.

  3. As suggested, you could try LiveAddress. Try it with a large set of your addresses to get a better idea (comparing from one address alone is, I'll admit, a weak indication) — but so far it seems like, for your needs, LiveAddress is somewhere between Google Maps and Nominatim.


Answer to question in comments

I ran out of room in the comments.

Q:

here is another address causing us issues "7580 E Big Cannon Drive,Anaheim Hills,Anaheim Hills,California,92808,US" even "7580 E Big Cannon Drive,California,92808,US" didn't seem to work with your site.

A:

I did some research on the USPS site and some other service providers as well. None returned any valid results or suggestions. But I found out what's the issue with the address as you submitted it:

  • Mispelled street name. No biggie; LiveAddress corrected this to Big Canyon.

  • Bad primary number. There's not much hope here if the primary number is incorrect. There's generally no way for a computer or human to infer what you really meant. In these cases, the address will fail verification and the user must supply something valid to go on. I found a valid primary number at 7584.

  • Master-planned community, not city/county. "Anaheim Hills" is the name of a master-planned community. Google found it in its business listings, but that has nothing to do with the address.

  • "Anaheim Hills" twice. It's confusing the parser. Unfortunately, with extra unnecessary information (esp. in a single-line address), it's nearly impossible to tell what part of it is dubious. That second "Anaheim Hills" has to go, but the first one can stay and it will be fine.

  • Country information. Most of the services I tried your address on got confused with the country in front and put it in the "Company/Firm Name" field. We deal with US addresses, so you can omit the country. It'll reduce the size of your request too.

LiveAddress was actually able to verify the address in these forms, both as a single-line address and split into components:

7584 E Big Cannon Drive anaheim hills ca 92808
7584 bg cannon 92808
7584 big cannon ave aneheim hills ca

The most significant help was finding a valid primary number. In the case that no valid addresses come back, you should alert the user and suggest fixing the primary number and making sure the city/state (if given) align with the zip code ('cause if those two are fighting, it's also impossible to tell what you meant).

Share:
11,384
Dale K
Author by

Dale K

Full Stack Developer, .NET, SQL Server HTML/CSS/JavaScript/jQuery/MVC/AngularJS/Angular.

Updated on June 04, 2022

Comments

  • Dale K
    Dale K about 2 years

    I have a website which needs to obtain the Latitude and Longitude for the address entered by the customer.

    Google/Bing/Yahoo are too expensive for us so we went with OpenStreetMap/Nominatim.

    Unfortunately while it worked OK during testing, its failing to find about 50% of the addresses entered which is a big issue.

    There are 3 things I am interested in knowing:

    1. What is the best way to deal with the situation where the customer really does enter an incorrect address - send them an email and ask them to correct it? Use segments of the address until something is found?

    2. What is the best way to handle the situation where the address is fine but I can't find it with OpenStreetMap? Or am I doing something wrong with my query to Nominatim?

    3. Does anyone know of a free/cheap alternative if OpenStreetMap isn't up to the task? I know its an open source collaboration and therefore not complete, but I thought it did have pretty good coverage, and that it would return a nearby location if it didn't have the exact location - maybe it does and maybe I'm using it wrong.

    Here is an example:

    182 livington ave,albany,New York,12210,US

    Google maps finds that easily. Nominatim finds nothing: http://nominatim.openstreetmap.org/search?format=xml&addressdetails=0&q=182%20livington%20ave,albany,New%20York,12210,US

  • Dale K
    Dale K almost 12 years
    Thanks Matt, I was actually just in the process of emailing your support address, here is another address causing us issues "7580 E Big Cannon Drive,Anaheim Hills,Anaheim Hills,California,92808,US" even "7580 E Big Cannon Drive,California,92808,US" didn't seem to work with your site.
  • Matt
    Matt almost 12 years
    @DaleBurrell Good find. I ran out of room in this comment, so I expanded my post to include an answer to your question.
  • Dale K
    Dale K almost 12 years
    Thank you again Matt, I imagine there must be a window when a new property is built where its address isn't fully registered with all relevant authorities? I will leave the question open a bit longer to see if others have opinions about how to best handle this situation. And I will certainly keep your product in mind after all the excellent help you've provided.
  • Matt
    Matt almost 12 years
    @DaleBurrell Which browser are you using that the [favicon](smartystreets.com/favicon.ico) doesn't appear? (Thanks for the report!) Anyway, sometimes there is a time for new properties where they are new, but they would be flagged by EWS; but this address isn't. Technically, LiveAddress recognized the name "Anaheim Hills" but the national ZIP+4 file has that name "flagged" as an unofficial and non-preferred Post Office name. In other words, it's a common spelling but is not recognized by the USPS, so the name was corrected to just Anaheim. This data came back in the footnotes of the request.
  • Matt
    Matt almost 12 years
    @DaleBurrell Sure; hope you find something that works out. I'd be interested to know what you finally end up implementing.
  • Dale K
    Dale K almost 12 years
    OK, what I have done in the short term is continue to use Nominatim, attempt to geocode the full address, if that fails attempt to geocode just the street address and the postcode, and if that fails attempt to geocode the postcode only. If they all fails it emails me. Thats the short term solution, I expect I'll use some of your ideas and maybe your product if the site becomes popular enough to require a better level of address validity. Thanks again for all your help, much appreciated.
  • Dale K
    Dale K almost 12 years
    One further question Matt, I've just had someone enter their address as "56 StreetName 3F" where the 3F is their apartment number. How would you recommend handling this scenario?
  • Matt
    Matt almost 12 years
    Sure. Could you clarify what you mean by "handling it"?
  • Dale K
    Dale K almost 12 years
    OK, well my assumption based on google maps etc is that address validation will fail in that format? In which case does one just say to the user please re-enter your address on a correct format?
  • Matt
    Matt almost 12 years
    Actually, the user entered it nearly correctly. Secondary (apartment) numbers go in the first delivery line with the primary number and street name. All that your example is missing is the secondary designator ("Ste", "Apt", "#", etc). When I plugged an address like this into Google's geocoder, it dropped that secondary information and approximated a result instead (it did keep it if something like "apt" was included). However, LiveAddress was able to standardize, verify, and geocode the same address correctly. So no, keep the address. It is actually "good enough" that it can be verified.