Skip to content

an open-source python toolbox to detect migration events

License

Notifications You must be signed in to change notification settings

theedenresearch/migration_detector

 
 

Repository files navigation

migration_detector

by Guanghua Chi, Fengyang Lin, Guangqing Chi, and Joshua Blumenstock

migrantion_detector is a Python toolbox to detect migration events in digital trace data, such as Call Detail Record (CDR), geo-tagged tweets, and other check-in data. It is able to handle a large volume of data (TB) and provides useful functions such as plotting the trajectory of a migrant and export your results.

How to install it

pip install migrantion_detector

NOTE:

  • migrantion_detector has a dependency on turi/GraphLab to speed up the computation by parallel computing (In our case, it only took about 40 minutes to detect migrants using 600 million unique trajectory records over four years.). You need to apply for a license and install it before installing migrantion_detector.
  • It is recommended to create a new Python 2.7 environment to install GraphLab and migrantion_detector.
  • Other requires: pandas, numpy, matplotlib, and seaborn.

How to use it

migrantion_detector is easy to use, just like pandas. First, you need to import your trajectory dataset and then detect the migrants. See demo.ipynb to learn how to use this package.

import migration_detector as md

traj = md.read_csv('example/migrant_location_history_example1.csv')

# plot trajectory
traj.plot_trajectory(user_id='1', start_date='20180701')

# detect migration events
migrants = traj.find_migrants()
print(migrants)

# plot a migrant's trajectory
traj.plot_segment(migrants[0], if_migration=True)

# save the result of detected migrants
md.to_csv(migrants, result_path='result', file_name='migration_event.csv')

# plot segments detected in the first step
user_id = '1'
user_result = traj.user_traj.filter_by(user_id, 'user_id')[0]
traj.plot_segment(user_result, if_migration=False, segment_which_step=1)

# save detected segments
traj.output_segments(segment_file='segments.csv', which_step=3)

Format of the input trajectory data

The input file should contain at least three columns: user_id(int or str), date(YYYYMMDD), location_id(int or str). The location depends on the definition of the migration, such as district, state, or city. Here is an example of trajectory data.

user_id date location
1 20180701 1
1 20180701 2
1 20180702 1

How to cite it

TO ADD

About

an open-source python toolbox to detect migration events

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 97.4%
  • Python 2.6%