App which identifies damages from a report picture.
- copy dev.env rename to .env insert secrets
- ensure all prerequisites such as docker, docker-compose, poetry, makefile, python3.10
- create local environment:
make venv - start comparison interface:
make ui - build docker image:
make build - run docker image:
make up - run end to end integration tests:
make test-integration
Optional:
- run training
make train - information about commands:
make help
There is defined command line interface for quick app management:
@ML-model:~/ml$ make
Please use make target where target is one of:
board open training monitoring board
build build services
clean clean all log files
down down services
help display this help message
test-integration run functional integration tests
test-unit run unit tests
test-vars test variables
train run training script
tune run hyperparameter search
ui start Streamlit comparison interface
up set up composition
up-db up only mongo service
venv create poetry virtual environment
tree -I 'model|logs|__pycache__|exploration|reports_1|.pytest_cache|.git'
.
βββ bitbucket-pipelines-dp.yml
βββ bitbucket-pipelines.yml
βββ data
βΒ Β βββ cma
βΒ Β βΒ Β βββ metadata_reports_1.json
βΒ Β βΒ Β βββ price_catalogues
βΒ Β βΒ Β βΒ Β βββ components_map.xlsx
βΒ Β βΒ Β βΒ Β βββ prices.xlsx
βΒ Β βΒ Β βΒ Β βββ README.md
βΒ Β βΒ Β βββ row_map_dataset.csv
βΒ Β βββ comparsion
βΒ Β βΒ Β βββ comparison_output.xlsx
βΒ Β βΒ Β βββ temp_b24.xlsx
βΒ Β βΒ Β βββ temp_map.xlsx
βΒ Β βΒ Β βββ temp_sap.xlsx
βΒ Β βββ docs
βΒ Β βββ 20240716020106114-CMAU3161209_412182_20231115_0931145343151520321252186-RBEN.png
βΒ Β βββ comparsion
βΒ Β βΒ Β βββ additional-rules.png
βΒ Β βΒ Β βββ desired_output.xlsx
βΒ Β βΒ Β βββ MAPOWANIE INDEKSOW.xlsx
βΒ Β βΒ Β βββ pipeline_test_output.xlsx
βΒ Β βΒ Β βββ project_plan.txt
βΒ Β βΒ Β βββ PRZYKLADY.xlsx
βΒ Β βΒ Β βββ Raport_czΔΕci_chΕodniczych___NA_DZIEΕ_2026-01-01_-_2026-02-19.xlsx
βΒ Β βΒ Β βββ RAPORT SAP.xlsx
βΒ Β βββ interface-eng.png
βΒ Β βββ interface-eng-scan.png
βΒ Β βββ interface-pl.png
βΒ Β βββ processed_row.png
βΒ Β βββ training_chart.png
βββ deploy_dev.sh
βββ dev.env
βββ docker-compose.yml
βββ Dockerfile
βββ interface
βΒ Β βββ cli
βΒ Β βΒ Β βββ comparser.py
βΒ Β βββ rest_api
βΒ Β βΒ Β βββ app.py
βΒ Β βββ streamlit
βΒ Β βββ app.py
βΒ Β βββ constants.py
βΒ Β βββ utils.py
βββ Makefile
βββ nginx
βΒ Β βββ nginx.conf
βββ poetry.lock
βββ poetry.toml
βββ pyproject.toml
βββ pytest.ini
βββ README.md
βββ src
βΒ Β βββ classifier
βΒ Β βΒ Β βββ config.py
βΒ Β βΒ Β βββ data_agumentation.py
βΒ Β βΒ Β βββ data_generator.py
βΒ Β βΒ Β βββ dataset.py
βΒ Β βΒ Β βββ encoder.py
βΒ Β βΒ Β βββ hyperparameter_tuning.py
βΒ Β βΒ Β βββ inference.py
βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βββ model.py
βΒ Β βΒ Β βββ train.py
βΒ Β βΒ Β βββ utils.py
βΒ Β βββ comparsion
βΒ Β βΒ Β βββ config.py
βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βββ loaders.py
βΒ Β βΒ Β βββ matcher.py
βΒ Β βΒ Β βββ pipeline.py
βΒ Β βΒ Β βββ README.md
βΒ Β βΒ Β βββ rules.py
βΒ Β βΒ Β βββ transformers.py
βΒ Β βΒ Β βββ writer.py
βΒ Β βββ config.py
βΒ Β βββ __init__.py
βΒ Β βββ parser
βΒ Β βΒ Β βββ errors.py
βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βββ llm_api.py
βΒ Β βΒ Β βββ ocr.py
βΒ Β βΒ Β βββ pricer.py
βΒ Β βΒ Β βββ prompt.py
βΒ Β βΒ Β βββ utils.py
βΒ Β βββ router.py
βΒ Β βββ schema.py
βΒ Β βββ utils.py
βββ tests
βββ APZU3211393_418675_20231212_0747334299019332773351257.webp
βββ __init__.py
βββ integration_tests
βΒ Β βββ delete_non_existing_rows.py
βΒ Β βββ __init__.py
βΒ Β βββ row_map_dataset_test.csv
βΒ Β βββ test_app.py
βΒ Β βββ test_logs.py
βΒ Β βββ test_parser
βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βββ test_ocr.py
βΒ Β βββ test_save_label.py
βββ unit_tests
βββ __init__.py
βββ test_dummy.py
19 directories, 85 files
To run this project, you will need to add the following environment variables to your .env file. Here's a table with examples and descriptions:
| Variable | Example Value | Description |
|---|---|---|
OPENAI_API_KEY |
sk-yourkeyhere123 |
Your OpenAI API key, required for GPT-4. Available from the OpenAI API dashboard. |
GOOGLE_API_KEY |
key |
Path to your Google Cloud credentials file. Required for accessing GCP Vision API. |
DEV_IP |
192.168.1.100 |
IP address of the development host machine. |
DEV_PROXY_IP |
192.168.1.101 |
IP address of the proxy machine that forwards to the development host. |
DEV_LOGIN |
developer |
Username for logging into the development machine. |
DEV_PASSWORD |
yourpassword |
Password for the development machine login. |
BITBUCKET_GIT_SSH_ORIGIN |
git@bitbucket.org:balticonit/ml.git |
Git SSH origin URL for your repository on Bitbucket. |
REPOSITORY_NAME |
ml |
Name of the repository on Bitbucket. |
DB_USER |
admin |
Username for the database login, typically an admin account. |
DB_PASSWORD |
securepassword123 |
Password for the database user. |
Please ensure to replace the example values with actual data suitable for your environment.
Logs are created in data/cma/logs/ directory.
In data/cma/price_catalogues there need to be:
- components_map.xlsx
- prices.xlsx
Swagger and testing the endpoints: http://0.0.0.0:8000/docs
This project provides multiple ways to interact with the core logic, showcasing a diverse set of interface development skills from interactive web applications to robust APIs and fast CLI tools.
An immersive, full-screen graphical interface built with Streamlit (interface/streamlit/app.py). It allows users to easily upload the B24, SAP, and Mapping files to execute the comparison logic defined in src/comparsion.
Another Streamlit interface available within the main web app that handles visual inspection reports. Users can upload .webp files and configure container metadata, seamlessly integrating with the background AI processing pipeline.
A robust FastAPI backend (interface/rest_api/app.py) providing programmatic access to the Image Recognition API and data management workflows. Fully documented with interactive Swagger UI.
For quick, scripted, and headless execution, a dedicated CLI comparser.py (interface/cli/comparser.py) is provided. It facilitates running comparisons directly from the terminal or CI/CD pipelines:
poetry run python -m interface.cli.comparser --b24 "Raport_B24.xlsx" --sap "RAPORT_SAP.xlsx" --map "MAPOWANIE.xlsx" --out "wynik.xlsx"This API handles image processing, data management, and machine learning operations.
Simple endpoint to verify that the API is running.
{{URL}}/
Receives a report as a .webp file along with metadata, processes it with OCR, and generates repair recommendations.
{{URL}}/read-report/
| Param | value |
|---|---|
| container_type | rf |
| shipowner | cma |
| report | binary data |
| Param | value | Type |
|---|---|---|
| token | {{TOKEN}} | string |
Saves label information based on the provided pipeline ID.
{{URL}}/save-label/{pipeline_id}
[
{
"localisation": "DB1N",
"component": "door",
"repair_type": "replacement",
"damage": "dent",
"length": 15.5,
"width": 7.2,
"quantity": 2,
"hours": "3",
"material": "steel",
"cost": "150"
}
]| Param | value | Type |
|---|---|---|
| token | {{TOKEN}} | string |
Deletes label information for a given pipeline ID.
{{URL}}/delete-label/{pipeline_id}
| Param | value | Type |
|---|---|---|
| token | {{TOKEN}} | string |
Series of experiments have been conducted and average evaluation loss from cross entropy loss function have been calculated on approximately 2600 records (20% of whole dataset). Model which was trained was ResNet50 model from torchvision with default weights.
models.resnet50(weights=models.ResNet50_Weights.DEFAULT)
Model have been trained to classify pictures of handwritten damage text into metadata:
Pictures have been normalizing by mapping OCR (GCP cloud vision) box into 1000x1000 pixels white space which will allow different augmentation like rotating, flipping, adding distractions etx.
There was no better weights addressed for that problem. With initial training config the model have problem of overfitting and the validation loss was increasing with drastic spikes since epoch one. - run_20240722-212617 , run_corpora
This problem have been resolved by adjusting the learning rate
- 0.00001 which significantly improve the learning graph, weight_decay 0.1- run_grindable
- 0.000001 had smooth graph but val error was higher - run_corpora
- 0.0005 gave good results too.
The dataset had problem of unbalanced classes, there were
location,component,repair_type,damage,counts
BL12,Plywood Floor Panel,Patch,Cracked,8
BL12,Plywood Floor Panel,Patch,Improper Repair,3
BL12,Plywood Floor Panel,Refit,Loose,2
BL12,Plywood Floor Panel,Replace,Holed,1
BL12,Plywood Floor Panel,Seal,Leak,2
BL13,Plywood Floor Panel,Patch,Cracked,2
BL13,Plywood Floor Panel,Replace,Cracked,1
BL1N,CLEANING,Remove,Debris,1
BL1N,Plywood Floor Panel,Patch,Cracked,1
BL1N,Plywood Floor Panel,Refit,Loose,5
BL1N,Plywood Floor Panel,Remove,Nails,1
BL23,Plywood Floor Panel,Patch,Cracked,12
BL23,Plywood Floor Panel,Replace,Cracked,1
Which afer drop each category to the only one which have more than 10 picture examples looked like this:
location,component,repair_type,damage,counts
BL23,Plywood Floor Panel,Patch,Cracked,12
BL2N,Plywood Floor Panel,Patch,Cracked,11
BL3N,Plywood Floor Panel,Refit,Loose,11
BL5N,Plywood Floor Panel,Refit,Loose,19
BR12,Plywood Floor Panel,Patch,Cracked,15
BR23,Plywood Floor Panel,Patch,Cracked,19
BR3N,Plywood Floor Panel,Refit,Loose,10
BX10,CLEANING,Remove,Debris,11
This made problem around 10 times less complicated, rather than recognizing 1407 different examples we recognize 177. So 1230 examples appear less than 10 times in the past - depends on end user if it is important to recognize these once, but for improving metrics dropping them can be valuable.
To address imbalanced classes problem stratified k-fold cross validation was introduced and trained with split for 3 and 5 folds.
There were experiments with data augmentation for the pictures which less than 50 count, but at the end it was too heavy for algorithm and the model overfit to synthesised examples rather than from real distribution which lead to increasing validation error. It also increases significantly the training time (around 7 days of training vs 2 day with normal config)
To lower the validation error I tried dropout value 0.5 it was too aggressive for a simple ResNet model. Maybe for more complex models with more hidden layers like CRN it can be valuable.
During experiments checkpoints which can be resumed was saved each 3 epochs and then model with lowest validation error has been chosen which was the sort of early stopping.
The hyperparameter search with HyperOpt have been implemented which will automate process of parameter searching, but can take up to 1 month of constant running. It is scheduled by bitbucket pipelines. The final set of hyperparameters chosen for the CMA shipowner is:
{
"shipowner": "cma",
"num_checkpoints": 3,
"with_augmentation": false,
"k_fold": 1,
"num_epochs": 30,
"resume_run": null,
"learning_rate": 5e-05,
"batch_size": 16,
"dropout_rate": null,
"weight_decay": 0.1,
"drop_categories": 10
}
It is necessary to train one model for each shipowner which will be located in data/ directory with defined subdirectories and managed by git LFS for now until the repository have 4 GB. After it will be necessary to use GCP bucket.
- use different architecture which is pretrained on recognizing handwritten text like CRN.
- iterate through data and delete pictures which contain only code and no damage description
- feedback from users which model is working better for this task: local model vs gpt-4o
- more experiments with batch sizes and different k fold splits
- open question is if there are multiple types of containers f.e. RF and DC, is the damage with same code like DB1N can mean different damages for them?
The project infrastructure takes full advantage of cloud-native architecture via docker-compose.yml, laying down a solid foundation for robust MLOps operations:
- Nginx (Reverse Proxy): Operates as the secure API Gateway routing traffic internally. This isolates the internal microservices (
appandui) from direct public internet exposure, gracefully load balances inbound HTTP traffic, and allows future seamless inclusion of TLS cryptography. - MongoDB (NoSQL Document Store): AI pipelines inherently deal with unstructured data, continuous schema transformations, and varying damage predictions. Storing labels in traditional SQL tables introduces harsh migration complexities. MongoDBβs JSON/BSON-like document architecture perfectly mirrors the dynamic nested JSON outputs emitted by models and GPT-4.
- Container Segregation Strategy: Decomposed the environment into dedicated containers (
app,ui,mongo,nginx) encapsulating their specific dependencies. This best practice completely sidesteps dependency hell and supports scalable horizontal distribution later onto Kubernetes or AWS ECS with minimal refactoring. - Health Checks & Recovery: Docker compose ensures each service contains distinct
healthchecksalongsiderestart: on-failurepolicies meaning transient API interruptions natively self-heal.
- python 3.11
- make
- docker
- docker compose
- access to Bitbucket repository - ssh token configured
- poetry
sudo apt update
sudo apt install pipx
pipx ensurepath
sudo pipx ensurepath --global # optional to allow pipx actions in global scope. See "Global installation" section below.
pipx install poetry
sudo apt-get install sshpass- manage docker as non root user:
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
docker run hello-world



