Skip to content

Working on PySpark and writing a structured .parquet file on Amazon Redshift after some basic cleaning and formatting

Notifications You must be signed in to change notification settings

Data5ci3nc3/Redshift_ingestion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Redshift_ingestion project

Working on PySpark and writing a structured file on Amazon Redshift.

Goals to be achieved

  • Load a parquet file from Amazon S3 using boto3 into a PySpark DataFrame
  • Understand the underlying concepts of Spark for big data management
  • Loading data onto Redshift
  • Understand the difference between a data lake and a data warehouse

About

Working on PySpark and writing a structured .parquet file on Amazon Redshift after some basic cleaning and formatting

Topics

Resources

Stars

Watchers

Forks