Apache Beam example

This data pipeline reads immigration data and performs transformations to join with other datasets using Apache Beam

For Local run

On your working directory (Windows) run on terminal: python3 pipeline.py --input_dir <PATH>\<TO>\<DATA>\ --output_dir <PATH>\<TO>\<OUTPUT>\

For Dataflow run

This asumes to have installed and configured gsutil on local computer. Also that file paths are written with syntax as Linux (/) instead of syntax as Windows (\)

Create Google storage bucket
Upload data from directory data/ to Google storage bucket

gsutil cp -r data gs://<YOUR GCP BUCKET>/

On Google cloud shell terminal install packages

sudo apt-get install python3-pip

sudo install -U pip sudo pip3 install apache-beam[gcp] oauth2client==3.0.0 pandas

On Google cloud shell editor upload run file: pipeline.py
Run job on cloud shell terminal

python3 pipeline.py --input_dir gs://<YOUR GCP BUCKET>/<PATH>/<TO>/<DATA>/ --output_dir gs://<YOUR GCP BUCKET>/<PATH>/<TO>/<DATA>/ --project <YOUR GCP PRJECT ID> --job_name <SET JOB NAME> --temp_location gs://<YOUR GCP BUCKET>/staging/ --staging_location gs://<YOUR GCP BUCKET>/staging/ --region us-central1 --runner DataflowRunner

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
data		data
README.md		README.md
pipeline.py		pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Beam example

For Local run

For Dataflow run

About

Releases

Packages

Languages

jomavera/apacheBeam-example

Folders and files

Latest commit

History

Repository files navigation

Apache Beam example

For Local run

For Dataflow run

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages