The Ethiopian Medical Business Data Warehouse & Analytics Platform enhances the efficiency of Ethiopia's healthcare sector by creating a robust data warehouse. It extracts data and images from public Telegram channels related to Ethiopian medical businesses, performs object detection on the images, and cleans, transforms, and stores the data in the warehouse. The main goal is to provide a unified solution for data analysis, supporting informed decision-making and driving strategic advancements in healthcare.
Python, DBT, SQL, ETL, PostgreSQL, FastAPI, Pandas, Pytest, SQLAlchemy, YOLOv5 Postman, CI/CD, Jupiter Notebook,Git , PDF & Google Drive (for project report).
-
ETL Process: Successfully managed the end-to-end ETL process, including data extraction, cleaning, transformation, and loading.
-
DBT for Data Modeling: Implemented data modeling and transformation using SQL with DBT.
-
Image Extraction and Object Detection: Extracted images from Telegram channels, performed object detection, and stored the results back into the data warehouse.
-
Database Management: Loaded cleaned data into a PostgreSQL database.
-
API Development: Exposed cleaned data for analysis through APIs using FastAPI, facilitating easy access from the database/data warehouse.
-
Project Documentation: Prepared comprehensive documentation for each step of the project to ensure clarity and understanding for the client.
- Data Scraping and Collection Pipeline
- Data Cleaning and Transformation
- Object Detection Using YOLO
- Exposing the Collected Data Using FastAPI
- Postman Collection
- Installation
- Usage
- Contributing
- License
Utilize the Telegram API or custom scripts to extract data from public Telegram channels related to Ethiopian medical businesses. Key channels include:
Collect images from specified Telegram channels for object detection:
For more details, see the data_scraping_and_cleaning.ipynb notebook.
- Remove duplicates
- Handle missing values
- Standardize formats
- Validate data
Set up DBT for data transformation and create models (SQL files) for data transformation:
pip install dbt
dbt init dbt_med
dbt run
Store cleaned data in a database.
For more details, see the data_scraping_and_cleaning.ipynb notebook.
Ensure necessary dependencies are installed:
pip install opencv-python
pip install torch torchvision
pip install tensorflow
git clone https://github.com/ultralytics/yolov5.git
cd yolov5
pip install -r requirements.txt
- Collect images from the specified Telegram channels.
- Use the pre-trained YOLO model to detect objects in the images.
- Extract data such as bounding box coordinates, confidence scores, and class labels.
- Store detection data in a database table.
For more details, see the yolo.ipynb notebook.
Install FastAPI and Uvicorn:
pip install fastapi uvicorn
Set up a basic project structure:
my_project/
├── main.py
├── database.py
├── models.py
├── schemas.py
└── crud.py
- In
database.py
, configure the database connection using SQLAlchemy.
- In
models.py
, define SQLAlchemy models for the database tables.
- In
schemas.py
, define Pydantic schemas for data validation and serialization.
- In
crud.py
, implement CRUD (Create, Read, Update, Delete) operations for the database.
- In
main.py
, define the API endpoints using FastAPI.
You can use the Postman API collection found in the link below:
To get started, follow these steps:
-
Clone the repository:
git clone https://github.com/Daniel-Andarge/AiML-ethiopian-medical-biz-datawarehouse.git cd AiML-ethiopian-medical-biz-datawarehouse
-
Create a virtual environment and activate it:
# Using virtualenv virtualenv venv source venv/bin/activate # Using conda conda create -n your-env python=3.x conda activate your-env
-
Install the required dependencies:
pip install -r requirements.txt
-
Run Data Scraping Scripts:
python extract_load_pipeline.py
-
Run DBT Models:
dbt run
-
Run Object Detection:
python detect.py --source data/telegram_images --save-txt --save-conf --project results --name run1
-
Start FastAPI Application:
uvicorn main:app --reload
Contributions are welcome. Please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your changes and commit them.
- Push your branch to your forked repository.
- Create a pull request to the main repository.
This project is licensed under the MIT License.
Special thanks to the contributors and the open-source community for their support and resources.