dlt x yato 🥑🇫🇷

Overview

This project analyzes and visualizes the evolution of the number of lawyers in France over time. Using public data, I created a data pipeline to process, analyze, and visualize this information.

The main objective is to provide insights into the growth and distribution of the legal profession in France.

Data Source

The project utilizes data from the French government's open data platform. The dataset is accessed directly from link.

This CSV file contains detailed information about the number of lawyers per bar association in France over time. The data is regularly updated, ensuring that our analysis reflects the most current trends in the French legal profession.

Stack

Our data pipeline leverages two powerful tools for efficient data processing and transformation:

dlt (data load tool)

dlt is an open-source library that simplifies the process of extracting, normalizing, and loading data. Key features include:

Automated schema inference and evolution
Built-in data verification and error handling
Support for various data sources and destinations

In this project, dlt is used for efficient data extraction from the CSV source and loading into our data processing pipeline.

yato (yet another transformation orchestrator)

yato is a lightweight SQL transformation orchestrator designed to work seamlessly with DuckDB. Its main advantages are:

Efficient execution of SQL queries in the correct order
Easy management of dependencies between transformations

We use yato, the smallest DuckDB SQL orchestrator on Earth, to orchestrate our SQL transformations on the extracted data. Yato works seamlessly with DuckDB, ensuring a clean and well-structured dataset for analysis.

The combination of dlt and yato, leveraging the power of DuckDB, creates a flexible, maintainable, and easy-to-understand data pipeline that forms the backbone of our analysis. This setup allows us to efficiently process and transform data using SQL queries within the DuckDB environment.

Installation

To set up the project environment:

Clone the repository:

git clone https://github.com/MohamedBsh/test-dlt-yato-avocado.git
cd test-dlt-yato-avocado

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate

Install the required dependencies:
```
pip install -r requirements.txt
```

Usage

To run the data pipeline and generate visualizations:

Ensure your virtual environment is activated.
Execute the main script:
```
python app/pipeline.py
```

Clean the database

python app/data_explorer.py
Choose 'clean'

Explore the database

python app/data_explorer.py
Choose 'explore'

Visualize the data
```
python generate_plot.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
app		app
sql/transform		sql/transform
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.ini		config.ini
generate_plot.py		generate_plot.py
lawyers_evolution_visualization.png		lawyers_evolution_visualization.png
lists.json		lists.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dlt x yato 🥑🇫🇷

Overview

Data Source

Stack

dlt (data load tool)

yato (yet another transformation orchestrator)

Installation

Usage

About

Releases

Packages

Languages

License

MohamedBsh/test-dlt-yato-avocado

Folders and files

Latest commit

History

Repository files navigation

dlt x yato 🥑🇫🇷

Overview

Data Source

Stack

dlt (data load tool)

yato (yet another transformation orchestrator)

Installation

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages