Project source code for James Lee's Aparch Spark with Python (Pyspark) course.
Tools like spark are incredibly useful for processing data that is continuously appended. The python bindings for Pyspark not only allow you to do that, but also allow you to combine spark streaming with other Python tools for Data Science and Machine learning. This course goes through some of the basics of using Apache Spark, as well as more advanced concepts like accumulators, combining Pyspark with Apache Kafka, using Pyspark with AWS tools like Kinesis, streaming data from sources like Twitter, and how to get the most out of the Structured Streaming paradigm in the recently-released Spark 2.3.0.
This course is a one-stop-shop for all your pyspark streaming education needs.
In this repo are the notebooks, data files, exercise files, and everything else you need to learn how to use the streaming capabilities of Pyspark.
Check out the full list of DevOps and Big Data courses that James and Tao teach here