Scalable Cloud-Based Distributed Computing for Efficient Big Data Analytics: A Dask Integration Approach

In this project, we are building a platform designed to be dynamically scalable based on user demand. At its core, it is the integration of Dask, a parallel execution framework, with JupyterHub, containerized and deployed on a cloud instance. We intend to benchmark the performance of Dask as a distributed computing framework on our cluster by conducting computationally intensive hyperparameter tuning of tree-based XGBoost algorithm on big data. Through systematic variations in input format, chunk size, task schedulers, worker nodes, clusters, and threading configurations, we seek to quantify the performance and compare it to baseline values obtained from running the program on the instance without distributing the workload. Our evaluation benchmarking serves two purposes: 1) to compare the performance of running computationally intensive ML algorithms with and without parallelizing the workload with Dask on cloud. 2) To understand in depth the many components of distributed computing that impact its performance.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Plots		Plots
code		code
dask-reports		dask-reports
ECC_ Final_Project_Report.pdf		ECC_ Final_Project_Report.pdf
ECC_Project Spring 2024 - Dilip-Anirudh-Subhadra.pptx		ECC_Project Spring 2024 - Dilip-Anirudh-Subhadra.pptx
ECC_Setup_Notes.pdf		ECC_Setup_Notes.pdf
README.md		README.md
config.yaml		config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scalable Cloud-Based Distributed Computing for Efficient Big Data Analytics: A Dask Integration Approach

About

Releases

Packages

Languages

AnirudhPenmatcha/Scalable-Cloud-Computing-for-Efficient-Big-Data-Analytics-A-Dask-Integration-Approach

Folders and files

Latest commit

History

Repository files navigation

Scalable Cloud-Based Distributed Computing for Efficient Big Data Analytics: A Dask Integration Approach

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages