Skip to content

DatabaseImport

johnsonc edited this page Mar 23, 2013 · 13 revisions

This page is for documenting step-by-step how we go from raw BEST datasets for routes, stops, and schedules to the final ChaloBEST database, and GTFS.

Database from BEST

The BEST datasets are officially updated every four months of the year, with effect from 1 March, 1 July, and 1 November annually. BEST has shared these datasets with us reliably since January 2011.

Data comes in spreadsheet (xlsx) format with the following interlinked tables, on which ChaloBEST database depends for complete import:

  • RouteAtlas contains schedules of every current bus route in detail, especially time, date, and route variations.
  • RouteDetails contains the current list of all routes with with a 4 digit code, all stops in each route with a 4 digit code, and distance between major stops every 1km or so.
  • RouteMaster contains a formatted list of all current routes, with mappings between the 4 digit code found in RouteDetails and its matching alias in RouteAtlas.
  • StopMaster contains the current list of all bus stops with a 4 digit code each, and an area code of 4 digits for the area in which the stop falls, and a road code of 4 digits for the road on which the stop falls.
  • AreaMaster contains the current list of areas in Mumbai with a 4 digit code each.
  • RoadMaster contains the current list of areas in Mumbai with a 4 digit code each.

Manual Inspection & Cleanup

Clean csv files

Remove extra headers and footers,
  make sure order of columns is the same as the older versions of the
  csv files.
Atlas.csv
Remove Erroneous 'AM N PM'
and extra cruft mostly in the last 10 lines or so..
   In the AM,N,PM columns and others
   Mostly containing 'TR' 'I' 'PS' values.
   Ctrl+F for these patterns when in the excel sheet is usually enough.

BEST Database Tables Import

Put csv files in db_csv_files/

  Relative Location of this folder:
      Chalobest/db_csv_files
      Chalobest/chaloBEST

Two main scripts used for imports

 imports/import_atlas.py
 imports/data_mapper.py

import_atlas is used for the Atlas.csv. Atlas is a separate file because its easily the most fuzzy and error prone import we have in ChaloBEST.

data_mapper is used to import the rest of the BEST csv data files. Import of StopMaster, AreaMaster and RoadMaster as given by BEST, with foreign keys for StopCode, AreaCode and RoadCode which link them together.

The structure of Data mapper: - CsvLoader('') This function creates an import process skeleton. Generates

_error.json file for any errors encountered in the errors/ folder.
   - *_save() methods
          - Which take in data one row at a time.
            This allows us to tweak any one save file process,
            if the need arises.
          - route_save(), road_save()

   - helper functions

   - fire_up()
        - This function simply calls CsvLoader() for each csv file present.
          Uses a saveorder dict for ordering the table imports.
          runs the helper functions
          and initiates importUniqueRoute process
          calls the postload cleanup functions
          and the fix_missing_data scripts ().
          In an ideal world, this should be the only function to call.

fire up the django shell

 from imports import data_mapper as dm
 from imports import import_atlas as ia
 from imports import fix_missing_atlas_data as fixes
 dm.fire_up()
 fixes.do()


 This will produce a bunch of stats and log files for manual cleaning.
 dm.CsvLoader('Csv file name') loads the csv files individually, in case one needs it.

3. UniqueRoutes Import

As given in RouteAtlas, every route has unique variations, from the full extent of the From-To route to midway stops and morning and night routes while buses are shunted between home depots. Therefore every Route has several UniqueRoutes. The Atlas import is a bit non automatable given the nature of anomalies in it. It will have to be done manually, the code steps generate various files.

 The import process is composed of
   1. CsvToJson()
      Creates a json file out of the csv file.
   2. processJson()
      - Description -
          - Creates a AtlasCopied.json (with empty
          - fields from the csv being replaced with values above them )
      - Requires -
          - routeMapping.json
            -  Which is a dictionary of routecode and route Alias
            -  Needed for linking the atlas to the Route table.

   3. groupUnique()
   4. importUniqueroutes()

MatchStops of First/Last Stops in RouteAtlas

5. EditStops to Geo-Locate StopMaster

6. EditStops to add Marathi and Alternative Stop Names

7. Stats to Track Progress of StopMapping

8. ChaloBEST Admin to add Marathi Areas and Road Names

9. ChaloBEST Admin to add and verify Holidays

10. Database Checking and Reporting

  • Error Log for BEST
  • Questions for BEST

11. Fix Missing Values in RouteAtlas

As given in RouteAtlas, in many instances the Headway/Frequency and Runtime for UniqueRoutes are missing for some variations. What we do in this case is to copy up and down... back and forth...

12. Convert to GTFS

13. GTFS Validation & Error Checking

14. Deploy OpenTripPlanner

15. Build ChaloBEST Android Application

Clone this wiki locally