This repository documents my journey of learning Apache Spark, from installation to programming.
The following technologies are used:
- Apache Spark
- PySpark
- Google Colab
- Google Drive
- PostgreSQL and Spark
- https://docs.cloudera.com/runtime/7.1.0/running-spark-applications/topics/spark-configure-apps.html
- https://docs.cloudera.com/runtime/7.1.0/running-spark-applications/spark-running-applications.pdf
- https://docs.cloudera.com/runtime/7.1.0/howto-data-science.html
- Learning Apache Spark
- Big Data Analytics with Apache Spark
- Big Data with Spark in Google Colab
- Getting Started with PySpark on AWS EMR
- Cloudera Spark Guide
- Why Use SparkSession?
- How to use Dataframe in pySpark (compared with SQL)