DataEng 2024 Template Repository

Project DATA Engineering is provided by INSA Lyon.

Students: LARIOZA Andrea, SIA Zhong Qing

Abstract

This repo contains a jupyter notebook, where you will find the code for the population of relations from csv into postgresql, and the outputs for the sql code used to answer the natural language queries below. Most pre-processing steps were completed using a csv file manager, and can be found in the commit history. They include replacement of '\N' values, improper date-time formats, and special characters in names and circuits. It was simply easier to clean them using excel rather than python.

Datasets Description

We are using the data on Formula 1 Races from 1950 to 2024 and can be obtained on Kaggle. It consists of several separate csv files; circuits, results, races, constructors, drivers and pit stops.

Queries

List circuits in the USA
List race count of the top 10 drivers in order.
Which driver has the most wins?
Which drivers have the same nationality as the constructor?
Show across time who drove the fastest in the Singapore Grand Prix?
Which constructors have not gotten first place before?
Which races have the tightest outcome?
Who are the constructors within the top 10 fastest pitstops of all time?
List all drivers who have driven for Mclaren
Which constructor and driver-pairs have the most podium finishes?

Requirements

All you will have to do is open the .ipynb file and click Run-All. As some files contain over 25,000 records and there are 7 files, please expect a waiting time of around 2 minutes.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
images		images
pdfs		pdfs
.Rhistory		.Rhistory
LICENSE		LICENSE
README.md		README.md
circuit.csv		circuit.csv
constructor.csv		constructor.csv
draftstoredprocedures.sql		draftstoredprocedures.sql
driver.csv		driver.csv
lap_time.csv		lap_time.csv
pit_stop.csv		pit_stop.csv
proj.ipynb		proj.ipynb
race.csv		race.csv
result.csv		result.csv
season.csv		season.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataEng 2024 Template Repository

Abstract

Datasets Description

Queries

Requirements

About

Releases

Packages

Contributors 3

Languages

License

reyandrea/DBMS2_Projects

Folders and files

Latest commit

History

Repository files navigation

DataEng 2024 Template Repository

Abstract

Datasets Description

Queries

Requirements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages