Skip to content

Code, quizzes, and notes from the DeepLearning.AI Data Engineering Professional Certificate specialization, showcasing practical projects, skills developed, and a capstone work in data engineering.

License

Notifications You must be signed in to change notification settings

ConnorBritain/deeplearning_data_engineering

Repository files navigation

DeepLearning.AI Data Engineering Specialization 🌟

Welcome to my repository for the DeepLearning.AI's Data Engineering Professional Certificate! This repo contains code, quizzes, and personal notes from the specialization, showcasing my journey in mastering data engineering concepts and tools.

📚 Overview

The Data Engineering Specialization is a comprehensive program designed to equip learners with the skills needed to design, build, and manage data pipelines and architectures. This repository documents my hands-on experience with the course material.

📑 Table of Contents

Courses

Course 1: Introduction to Data Engineering

  • Key Topics:
    • Data engineering lifecycle and undercurrents
    • Designing data architectures on AWS
    • Implementing batch and streaming pipelines
  • Content:
    • Notes on requirements gathering and stakeholder collaboration
    • Code samples for batch and streaming pipelines
    • Architecture diagrams and design considerations

Course 2: Data Ingestion and DataOps

  • Key Topics:
    • Working with source systems (relational and NoSQL databases)
    • Data ingestion techniques (batch and streaming)
    • DataOps practices (CI/CD, Infrastructure as Code, data quality)
  • Content:
    • Scripts for data ingestion from APIs and message queues
    • Terraform configurations for AWS resources
    • Airflow DAGs for orchestrating data pipelines
    • Data quality tests using Great Expectations

Course 3: Data Storage and Retrieval

  • Key Topics:
    • Storage systems (object, block, file storage)
    • Data lake and data warehouse architectures
    • Query optimization and performance tuning
  • Content:
    • Implementations of data lakehouse architectures
    • Advanced SQL queries and performance comparisons
    • Notes on storage formats and indexing strategies

Course 4: Data Modeling and Transformation

  • Key Topics:
    • Data modeling techniques (normalization, star schema, data vault)
    • Transformations for analytics and machine learning
    • Batch and streaming data processing
  • Content:
    • Data models and schemas for different use cases
    • PySpark code for data transformations
    • Preprocessing pipelines for machine learning datasets

🛠 Skills Developed

  • Data Architecture Design
  • Data Ingestion Techniques
  • DataOps Practices
  • Data Storage and Retrieval
  • Data Modeling
  • Data Transformation and Orchestration

🔧 Technologies Used

  • Programming Languages: Python, SQL
  • Cloud Platforms: AWS
  • Data Processing Frameworks: Apache Spark, PySpark, Pandas
  • Orchestration Tools: Apache Airflow
  • Infrastructure as Code: Terraform
  • Data Quality Tools: Great Expectations
  • Databases: MySQL, PostgreSQL, MongoDB, Amazon S3
  • Others: REST APIs, Message Queues, Streaming Platforms

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📫 Contact

Feel free to reach out via LinkedIn or email for any questions or collaborations!

About

Code, quizzes, and notes from the DeepLearning.AI Data Engineering Professional Certificate specialization, showcasing practical projects, skills developed, and a capstone work in data engineering.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published