Skip to content

Benchmarks provide standardized, repeatable, and comparable evaluations of decision support system technologies such as databases, data warehouses, and OLAP tools.

Notifications You must be signed in to change notification settings

RanaRomdhane/DW-DaskSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿงฎ TP: Dask SQL โ€“ TPC-H Benchmark Analysis

This project demonstrates how to use Dask SQL to execute and analyze TPC-H benchmark queries for distributed data processing and performance evaluation.
The notebook illustrates how Dask SQL enables scalable analytics by combining the flexibility of Python with the power of SQL for large datasets.


๐Ÿš€ Features

  • Execution of TPC-H benchmark queries (Q1โ€“Q22) using Dask SQL
  • Comparison between Dask SQL and traditional SQL engines
  • Exploration of distributed computing concepts
  • Performance metrics and query optimization techniques

๐Ÿ› ๏ธ Requirements

Before running the notebook, install the following dependencies:

pip install dask dask-sql pandas numpy matplotlib jupyter

๐Ÿ“‚ Project Structure

RanaDaskSQL_tpch.ipynb   # Main notebook containing code and analysis
README.md

โ–ถ๏ธ How to Run

  • Open the notebook:
jupyter notebook RanaDaskSQL_tpch.ipynb
  • Run all cells in sequence to:

  • Initialize the Dask SQL context

  • Load TPC-H tables (e.g., lineitem, orders, customer, etc.)

  • Execute benchmark queries

  • Visualize query results and performance

๐Ÿ“Š Example Output

  • Query execution times for multiple datasets

  • Result previews for benchmark queries

  • Comparative visualizations of distributed vs. local execution

๐Ÿ“š License

This project is for educational purposes only.

About

Benchmarks provide standardized, repeatable, and comparable evaluations of decision support system technologies such as databases, data warehouses, and OLAP tools.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published