Working on PySpark and writing a structured file on Amazon Redshift.
- Load a parquet file from Amazon S3 using boto3 into a PySpark DataFrame
- Understand the underlying concepts of Spark for big data management
- Loading data onto Redshift
- Understand the difference between a data lake and a data warehouse