Skip to content

Nagoya

Compare
Choose a tag to compare
@albarrentine albarrentine released this 20 Dec 19:04
· 3658 commits to master since this release

Merged a few of the commits from the parser-data branch of libpostal into master to fix address parser training from the master branch.

Coincides with the release of some of the parser training data generated in the parser-data branch:

  1. OSM training addresses (27GB, ODBL)
    This is (a much-improved version of) the original data set used to train libpostal.
  2. OSM formatted place names/admins (4GB, ODBL)
    Helpful for making sure all the place names (cities, suburbs, etc.) in a country are part of the training set for libpostal, even if there are no addresses for that place.
  3. GeoPlanet postal codes with admins (11GB, CC-BY)
    Contains many postal codes from around the world, including the 1M+ postcodes in the UK, and their associated admins. If training on master, this may or may not help because it still relies pretty heavily on GeoNames for postcodes.
  4. OpenAddresses training addresses (30GB, various licenses)
    By far the largest data set. It's not every source from OpenAddresses, just the ones that are suitable for ingestion into libpostal. It's heavy on North America but also contains many of the EU countries. Most of the sources only require attribution, some have share-alike clauses. See openaddresses.io for more details.

Users are encouraged to QA the data for problematic patterns, etc. Note: while it's possible now to train per-country/language parsers using slices of the data, there will be no support offered for custom parsers.

Release is named after the largest train station the world: https://en.wikipedia.org/wiki/Nagoya_Station.