Normalizing street addresses in Django/Python

11,168

Solution 1

The most reliable way to do this is to utilize a bona-fide address verification service. Not only will it standardize (normalize) the address components according to USPS standards (see Publication 28) but you will also be certain that the address is real.

Full disclosure: I work for SmartyStreets, which provides just such a service. Here's some really simple python sample code that shows how to use our service via an HTTP GET request:

https://github.com/smartystreets/LiveAddressSamples/blob/master/python/street-address.py

Solution 2

I have recently created a street-address python module, and its StreetAddressFormatter can be used to normalize your address.

Solution 3

This is how I ended up addressing this (no pun intended):

### models.py ###

def normalize_address_for_display(address):

    display_address = string.capwords(address)

    # Normalize Avenue
    display_address = re.sub(r'\b(Avenue|Ave.)\b', 'Ave', display_address)

    # Normalize Street
    display_address = re.sub(r'\b(Street|St.)\b', 'St', display_address)

    # ...and other rules...

    return display_address

class Store(models.Model):

    name = models.CharField(max_length=32)
    address = models.CharField(max_length=64)
    city = models.CharField(max_length=32)
    state = models.CharField(max_length=2)
    zipcode = models.CharField(max_length=5)

    @property
    def display_address(self):
        return normalize_address_for_display(self.address)

I then use Place.display_address in templates. This allows me to keep the original user submitted data in the database without modification and just use display_address when I want a normalized display version.

Open for comments/suggestions.

Solution 4

One option would be to use Geopy to lookup the address on someone like Yahoo or Google Maps, which will then return the full address of the one(s) they match it with. You may have to watch for apartment numbers being truncated off in the returned address (e.g. "221 Amsterdam Av #330" becoming "221 AMSTERDAM AVENUE"). In addition, you will also get the city/state/country information, which the user may have also abbreviated or misspelled.

In the case that there is multiple matches, you could prompt the user for feedback on which is their address. In the case of no matches, you could also let the user know, and possibly allow the address save anyway, depending on how important a valid address is, and how much trust you put in the address-lookup-providers' validity.

Regarding doing this normalization in the form vs. model, I don't know what the preferred Django-way of doing things is, but my preference is in the form, for example:

def clean(self):
    # check address via some self-defined helper function
    matches = my_helper_address_matcher(address, city, state, zip)
    if not matches:
        raise forms.ValidationError("Your address couldn't be found...")
    elif len(matches) > 1:
        # add javascript into error so the user can select 
        # the address that matches? maybe there is a cleaner way to do this
        raise forms.ValidationError('Did you mean...') 

You could throw this lookup function in the model (or some helpers.py file) in case you want to reuse it in other areas

Share:
11,168
Belmin Fernandez
Author by

Belmin Fernandez

Learning and helping.

Updated on July 24, 2022

Comments

  • Belmin Fernandez
    Belmin Fernandez almost 2 years

    I have a Django form where one of the fields is a TextInput for a street address.

    I want to normalize the data. For example:

    >> normalize('420 East 24th St.')
    '420 E. 24th Street'
    
    >> normalize('221 Amsterdam Av')
    '221 Amsterdam Ave.'
    
    >> normalize('221 Amsterdam Avenue')
    '221 Amsterdam Ave.'
    

    Or something like that. I'm already using geopy for geocoding. Perhaps this might help?

    Also: Where should I normalize? In the database model or in the clean function of the form field?

  • bgw
    bgw about 12 years
    I'm working on a library that has to deal with addresses, and while SmartyStreets looks a bit expensive (although the free tier is fairly generous), and would probably add a bit of latency to my library (requiring a round trip to a server), it looks like a pretty awesome service. I think I might add support for it. Keep up the good work!
  • mdwhatcott
    mdwhatcott about 12 years
    Thanks! Just so you know, we are geo-distributed and requests are handled at the data center closest to the user's location which cuts down on latency.
  • Cerin
    Cerin almost 10 years
    Word of caution, I've used these services, and they're not terribly accurate, especially with apartments and subdivisions. Also, they're very difficult, if not impossible, to use to process large batches.
  • Joey Baruch
    Joey Baruch over 2 years
    Note the the last \b wont match if there isn't a new word character after it. Use r'\b(?:Street\b|St\.\B)' instead. regex101.com/r/aeuAbj/1