Skip to content
#

dataingestion

Here are 19 public repositories matching this topic...

This repository for a project detailing the step by step approach of scraping data, integrating data from various sources, performing analysis on data from various sources for the purpose of analaysis. It also shows how APIs can be harnessed for data engr operations. In this project, the four square API was utilized for the location data.

  • Updated Feb 21, 2023
  • Jupyter Notebook

Modern machine learning project development - This end-to-end project implementation provides the real-time delay updates to logistic companies. It uses MLflow for model tracking and management, Hopsworks Feature Store for storing and managing the dataset, and Streamlit for building an interactive web application to predict truck delays.

  • Updated Nov 30, 2024
  • Jupyter Notebook

Describe the different entities that form a modern data ecosystem. Describe and differentiate between the role and responsibilities of Data Engineers, Data Scientists, Data Analysts, Business Analysts, and Business Intelligence Analysts. Explain what Data Engineering is. List the tasks that need to be performed in a typical data engineering life…

  • Updated Oct 6, 2021

This project builds a cloud-based pipeline to extract NYC taxi data from an API and store it in Azure Data Lake Storage (ADLS). Databricks and PySpark are used to transform the data through the medallion architecture (Bronze → Silver → Gold). Delta Lake ensures reliable storage, and Power BI provides visual insights for data-driven decision-making.

  • Updated Dec 3, 2024
  • Jupyter Notebook

The main purpose of this repository is to build the pipeline for training of regression models and predict the compressive strength of concrete to reduce the risk and cost involved in discarding the concrete structures when the concrete cube test fails.

  • Updated Feb 27, 2023
  • Python

This repo hosts an end-to-end machine learning project designed to cover the full lifecycle of a data science initiative. The project encompasses a comprehensive approach including data Ingestion, preprocessing, exploratory data analysis (EDA), feature engineering, model training and evaluation, hyperparameter tuning, and cloud deployment.

  • Updated Feb 28, 2024
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the dataingestion topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dataingestion topic, visit your repo's landing page and select "manage topics."

Learn more