Skip to content

UW-THINKlab/Mobility-Analysis-Workflows-tutorial

Repository files navigation

Tutorial A: Using MAW from shell scripts (requires Linux systems)

This tutorial provides instructions on how to set up a Linux machine and run workflows using shell scripts.

Since the containers in the MAW are built using Docker, it is intuitive to also run the MAW using Docker. However, Docker has a critical limitation: it requires administrative access to launch the Docker platform and execute the containers, and thus cannot be used in cloud servers (e.g. AWS Lambda) or shared computers that do not grant administrative access to users. To ensure the interoperability of this tutorial on machines that grant administrative access or not, an alternative container platform named Singularity is used to run the containers and workflows in the MAW in this tutorial. Singularity does not require administrative access to be launched and working, and is compatible with Docker, meaning that containers and workflows built using Docker can be run using Singularity.

1. Installing Singularity

Follow these steps to install Singularity. To verify if Singularity is installed properly, run the following command.

singularity run docker://hello-world

If the following output is displayed, Singularity will be ready to use.

alt text

2. Downloading Docker images

A Docker image is a template that contains instructions and files for creating containers. When a Docker image is launched on Docker or Singularity, the corresponding containers are created and become ready to use. The Docker image uwthinklab/maw_containers_1:v6 contains the five containers (see Section 3.1 of the paper) for analyzing mobile data. To download this Docker image, run the following command on the Linux command-line interface.

singularity pull docker://uwthinklab/maw_containers_1:v6

3. Preparing the shell script

A shell script is a computer program that is designed to be run by the Unix/Linux command-line interpreter. It contains a sequence of Unix/Linux commands. The key command to run a container is as following, where ${·} indicate variables whose values need to be set by the user.

singularity exec --bind ${Data_path}:${Data_path_project} docker://uwthinklab/maw_containers_1:v6 python ${Code_for_a_container} ${Container_input_file} ${Container_output_file} ${Container_change_point_value(s)}

The variables in the above command have the following meanings.

  • ${Data_path}: the working directory on your machine. It is where the input datasets are stored and where the container/workflow output will be found.
  • ${Data_path_project}: the working directory in the Docker image. When a Docker image is executed, synchronization between ${Data_path} and ${Data_path_project} is established: changes in one will be reflected in the other. This allows (1) code in the Docker image to access input data on your machine and (2) output generated by the code in the Docker image to be synchronized to your machine.
  • ${Code_for_a_container}: name of the .py file that implements a container. The file for the five MAW containers are listed in the following table.
Container name ${Code_for_a_container}
Trace Segmentation Clustering TraceSegmentationClustering.py
Incremental Clustering IncrementalClustering.py
Stay Duration Calculator UpdateStayDuration.py
Oscillation Corrector AddressOscillation.py
Stay Integrator CombineExtractedStays.py
  • ${Container_input_file}: the name of the file that contains the input data to the container. The full path to the input data file should be ${Data_path_project}/${Container_input_file}.
  • ${Container_output_file}: the name of the file where the container output will be stored. The full path to the output file should be ${Data_path_project}/${Container_output_file}.
  • ${Container_change_point_value(s)}: change point values for the container. If multiple change points need to be set, separate them using space.

An example of running the Trace Segmentation Clustering container is given below. It is assumed that the input data is “/MAW/input/GPS_data.csv” stored on your Linux machine, and the container output is to be organized in the file “/MAW/output/GPS_stays.csv” on your Linux machine, and the two change points – distance threshold and duration threshold – are set to 0.2 km and 300 seconds, respectively.

singularity exec --bind /MAW:/MAW_projected docker://uwthinklab/maw_containers_1:v6 python TraceSegmentationClustering.py /MAW_projected/input/GPS_data.csv /MAW_projected/output/GPS_stays.csv 0.2 300

Running a workflow is equivalent to running a sequence of containers. This can be done by writing a sequence of singularity exec commands that execute the containers, with the ${Container_output_file} of the previous command serving as the ${Container_input_file} for the next command. An example shell script for running Workflow 2 in the paper using a synthetic cellular dataset is provided here.

4. Running the shell script

Shell script files have the extension “.sh”. Suppose the shell script created for running a workflow has the file name “script.sh”. To run this script, navigate to the directory where this script is located, and execute the following command.

bash script.sh



Tutorial B: Using MAW from a graphical user interface

In this tutorial, the BioDepot-workflow-builder (Bwb) (Hung et al., 2019) is employed to run Workflow 2 and Workflow 6 described in Section 4.2 of the paper, using a synthetic cellular dataset and GPS dataset. Bwb provides an easy-to-use graphical interface: each container is represented as a widget (an icon that allows users to deploy, configure and execute the container), and a workflow is represented as a directed acyclic graph of widgets. Running MAW through Bwb requires minimum programming skills from the user, and thus allows users to easily access and reproduce mobility analysis methods and results.

A video version of this tutorial can be found at https://www.youtube.com/watch?v=9emIszx2hgo.

1. Installing Docker

Both MAW and Bwb are implemented using Docker containers. Therefore, installing Docker properly is a prerequisite to launch MAW and Bwb on a computer.

To install Docker (Desktop version), please follow the instructions on https://docs.docker.com/get-docker/.

To make sure Docker is installed and running properly on your computer, open terminal (for Mac/Linux user) or command-line interface (for Windows users) and test the following command.

docker run hello-world

If the following output is shown, Docker is properly installed and running. Then proceed to the next step. Otherwise, please refer to the troubleshooting and FAQs for using Docker.

alt text

2. Downloading Docker images

This step is similar to Step 2 in Tutorial A, with two differences. First, Docker images will be downloaded using Docker instead of Singularity; and second, two other Docker images in addition to uwthinklab/maw_containers_1:v6 are needed for the graphical user interface to work properly. These two Docker images are listed below.

  • uwthinklab/maw_gui: the Docker image for creating the container that provide the graphical user interface for MAW;
  • uwthinklab/maw_visualization: the Docker image for creating four containers that are used to visualize inferred mobility patterns.

To download all the required Docker images, run the following commands on the Mac/Linux terminal or Window command-line interface.

docker pull uwthinklab/maw_gui:v2
docker pull uwthinklab/maw_containers_1:v6
docker pull uwthinklab/maw_visualization:v1
docker pull uwthinklab/maw_visualization:gnumeric

3. Downloading the example workflows

Workflow 2 and workflow 6 described in Section 4.2 of the paper are stored in folders “MAW_case1” and “MAW_case2” in this repository, respectively.

Git will be used to download the example workflows. Follow the instructions on https://git-scm.com/downloads to install Git if it has not been installed.

After installing Git, restart the terminal or command-line interface. Then navigate to your working directory (denoted as ${wd} hereafter) on the terminal or command-line interface by typing in cd ${wd}, and run the following command.

git clone https://github.com/UW-THINKlab/Mobility-Analysis-Workflows-tutorial

This creates a new folder named “Mobility-Analysis-Workflows-tutorial” in the working directory. In this folder, two sub-folders “MAW_case1” and “MAW_case2” contain Workflows 2 and 6, respectively. Navigate into the “Mobility-Analysis-Workflows-tutorial” folder in the terminal or command-line interface.

4. Downloading the synthetic datasets

The synthetic GPS dataset for testing Workflow 6, and the synthetic cellular dataset for testing Workflow 2 can both be downloaded from: http://dx.doi.org/10.17632/cb2r6hv72b.1. The two datasets reside in the files “`input_case1_v2.csv” and “input_case2_v2.csv”, respectively.

Download these two csv files, create a new directory ${wd}/Mobility-Analysis-Workflows-tutorial/trans_data, and place the csv files in this new directory.

5. Launching the graphical interface for MAW

The graphical interface allows users to access the five containers (see Section 3.1 of the paper) and the two example workflows in MAW. Workflows can be executed in the graphical interface by simply clicking a few buttons, and the workflow output (i.e. inferred stays) can be visualized on a map.

Before launching the MAW graphical interface, please make sure that the terminal or command-line interface is navigated to the directory ${wd}/Mobility-Analysis-Workflows-tutorial.

To launch the graphical interface, for MAC/Linux users, run the following command.

docker run --rm -p 6080:6080 -v ${PWD}/:/data -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/.X11-unix:/tmp/.X11-unix --privileged --group-add root uwthinklab/maw_gui:v2

For Windows users, run the following command.

docker run --rm -p 6080:6080 -v %cd%/:/data -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/.X11-unix:/tmp/.X11-unix --privileged --group-add root uwthinklab/maw_gui:v2

The launch is successful if the following output is displayed on the terminal or command-line interface.

alt text

To access MAW’s graphical interface, go to link: http://localhost:6080/ using any web browser. The following page will be pop up.

alt text

6. Running example workflows

6.1. Running Workflow 2

On the graphical interface shown above, click "File" in the menu bar, and click "Load workflow". Navigate to the “/data” directory in the “Load workflow from Directory” window, then choose the downloaded workflow folder “MAW_case2” (do NOT enter the folder) and click “Choose”. Then the Workflow 2 will be shown on the graphical interface as in the following screenshot.

alt text

Compared to the Workflow 2 described in Section 4.2 of the paper, the four visualization containers – “Select Users To Display”, “Execute Notebook”, “Display Notebook” and “gnumeric” – are also concatenated to the workflow to allow users to view the mobility analysis results. To start running the workflow, double-click the "Incremental Clustering" icon and click the "Start" button in the pop-up window. The “Incremental Clustering” container will then start running, and all subsequent containers will automatically start running following the linking order. When the “gnumeric” container finishes running, a spreadsheet will pop up showing the analyzed location records with stay information attached, as in the following screenshot.

alt text

There are 12 fields in the spreadsheet. The meaning of each field is given below.

  • unix_start_t: the unix time stamp at which a location record is observed;
  • user_ID: a unique identifier for each mobile device;
  • mark_1: the operating system (OS) on the mobile device, 0 for Android, 1 for iOS and “Nothing” for unknown OS;
  • orig_lat: the latitude of the location record in decimal degrees;
  • orig_long: the longitude of the location record in decimal degrees;
  • orig_unc: the location accuracy of the location record in meters;
  • stay_lat, stay_long, stay_unc, stay_dur: the latitude, longitude, radius (see Section 3.1 of the paper) and duration in seconds of the inferred stay; if any of these four fields has a value of -1, this location record is not associated with any stay (i.e. a transient point);
  • stay_ind: a unique identifier given to each inferred stay. This field is not calculated for the example workflows and always has a placeholder value of -1;
  • human_start_t: translation of unix_start_t into year (first two digits), month (third and fourth digits), day (fifth and sixth digits), hour (seventh and eighth digits), minute (ninth and tenth digits) and second (eleventh and twelfth digits).

When the “Display Notebook” container finishes running, a Jupyter notebook will pop up showing the visualized location records and inferred stays on maps, as in the following screenshot. Location records are represented as small red circles, and inferred stays are represented as larger red dots. Green lines connect temporally consecutive location records. Two maps are shown on each notebook, the first one with raw location records, and the second one with both location records and inferred stays.

alt text

6.2. Running Workflow 6

Restart MAW by first shutting down the running containers, and then following Step 5 to re-launch MAW. To shut down the containers, first close the webpage running Workflow 2. Then open Docker Desktop, and for each running container, click the “STOP” button, as shown below.

alt text

Workflow 6 can be loaded onto the graphical interface similarly as Workflow 2. The loaded Workflow 6 is shown in the following screenshot.

alt text

Compared to the Workflow 6 described in Section 4.2 of the paper, the four visualization containers are concatenated to the workflow. To run the workflow, double-click the "Trace Segmentation Clustering" icon and click the "Start" button in the pop-up window.

Output visualization for Workflow 6 is in the same format as for Workflow 2, and thus will not be described in detail here.

7. Running containers or workflows with users’ own datasets (optional)

MAW allows users to analyze their own datasets using its containers and workflows. Suppose a user would like to apply the Workflow 6 to analyze a custom dataset stored in a csv file named “real-world data.csv”. First, the csv file needs to be place in the directory ${wd}/Mobility-Analysis-Workflows-tutorial/trans_data.

After loading the Workflow 6 as described in Step 6.2, double-click the "Trace Segmentation Clustering" icon. In the pop-up window, there is a text box titled “Input”. The path to the custom csv file can be typed in this text box to change the input to the custom dataset. In this case, the path to the custom csv file should be “/data/trans_data/real-world data.csv”, as shown in the following screenshot. Alternatively, one can click the folder icon next to the input text box, navigate to the custom csv file and select and open the file. Once the input path has been changed, clicking the “Start” button will start analyzing the designated dataset.

alt text

The input to any other container can be changed in the same way.

When the input dataset changes, it is possible that different values for the change points are preferred. Assigning different change point values can be done in the same pop-up window as the one for specifying container input. In the above screenshot, the “Trace Segmentation Clustering” container has a distance threshold of 0.2 km and duration threshold of 300 seconds. These values can be changed by users in this pop-up window. For other containers, the values of their change points can be modified in the same way.

8. Modify the workflow

There are numerous ways a workflow can be changed or customized. We give an example of how to skip the container of “Oscillation Corrector” in Workflow 2.

First, the container “Oscillation Corrector” and the subsequent container “Stay Duration Calculator” need to be removed from the workflow. To do so, right-click on each widget, and click “Remove”.

Then, a link needs to be drawn from the remaining “Stay Duration Calculator” container to the “gnumeric” container. To do so, left-click on the right edge of “Stay Duration Calculator”, hold and drag to the left edge of “gnumeric”. Upon releasing your mouse, a link configuration window will pop up as shown in the following screenshot.

alt text

The link configuration window defines the functionality of the link. In the above screenshot, the link means the output of “Stay Duration Calculator” serves as input to “gnumeric”, and should be kept this way. If the link needs to be configured differently, it can be adjusted by clicking the checkboxes. Then click “Ok” to confirm the configuration and close the pop-up window. The modified workflow looks as below.

alt text

The modified workflow can be run similarly as Workflow 2 in the paper. After the workflow is completed, the output should look as below.

alt text

Reference

Our work is built based on Bwb (https://github.com/BioDepot/BioDepot-workflow-builder). See also:

Hung, L.-H., Hu, J., Meiss, T., Ingersoll, A., Lloyd, W., Kristiyanto, D., Xiong, Y., Sobie, E., Yeung, K.Y., 2019. Building Containerized Workflows Using the BioDepot-Workflow-Builder. Cell Systems 9, 508-514.e3. https://doi.org/10.1016/j.cels.2019.08.007.

Our MAW icons are made by Flat Icons, itim2101, Pixel perfect, Freepik, Darius Dan, Smashicons from www.flaticon.com

About

This is a tutorial of running Mobility Analysis Workflow (MAW)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published