Skip to content

This repository contains the projects, files, and notes.

Notifications You must be signed in to change notification settings

jsvitor/data-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Roadmap

Welcome to the Data Engineering Roadmap! This guide outlines the key concepts and skills needed to become a proficient Data Engineer.

Core Concepts

Data Engineering Foundations

  • Essential principles of Data Engineering
  • Roles and responsibilities in the field
  • Workflow and methodologies
  • Understanding data pipelines
  • Comparing ETL and ELT approaches
  • Exploring batch and streaming workloads
  • Diving into Big Data concepts

Data Fundamentals

  • Exploring various data types
  • Understanding data structures
  • Working with different file formats
  • Hands-on exercises with data types and file formats

Data Architectures

  • Differentiating between OLTP and OLAP systems
  • Exploring Big Data architectures:
    • Lambda architecture
    • Kappa architecture
  • Understanding Event-Driven Architecture (EDA)

Data Acquisition Techniques

  • Identifying diverse data sources
  • Implementing web crawling and scraping
  • Leveraging APIs for data collection
  • Practical exercises in data gathering

Data Governance and Management

  • Implementing effective data governance strategies
  • Exploring the DATA-DMBOK framework

Advanced Topics

Microservices and Containerization

  • Understanding microservices architecture
  • Exploring virtualization and containers
  • Mastering Docker, Docker-Compose, and Kubernetes
  • Building data pipelines with Databricks:
    • Setting up environments
    • Utilizing notebooks
    • Implementing workflows

Data Analysis and Mining

  • Data preprocessing techniques
  • Feature selection methods
  • Exploring various data mining algorithms
  • Practical classification exercises

Modern Data Engineering Practices

  • Adopting Agile methodologies
  • Implementing DevOps in data workflows
  • Exploring DataOps principles
  • Understanding FinOps for cost optimization
  • Hands-on DataOps development

Cutting-Edge Data Stacks

  • Exploring Data Mesh architecture
  • Understanding Zero ETL approaches
  • Leveraging Amazon Aurora Zero-ETL
  • Utilizing Astro Python SDK
  • Building modern data warehouses

Best Practices in Data Engineering

  • Adhering to Clean Code principles (PEP 8 for Python)
  • Implementing effective version control
  • Creating comprehensive documentation

Practical Application

Throughout this roadmap, you'll engage in hands-on projects and exercises to apply your knowledge in real-world scenarios.

Additional Resources

Getting Started

  1. Fork this repository to your GitHub account.
  2. Clone the forked repository to your local machine.
  3. Explore each topic in the roadmap sequentially.
  4. Complete associated projects and exercises.
  5. Refer to additional resources for deeper understanding.

Community Contributions

We encourage community involvement! If you have suggestions, corrections, or want to add resources, please submit a pull request or open an issue.

Licensing

This roadmap is shared under the MIT License.

About

This repository contains the projects, files, and notes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published