This application allows to optimize the working time of a qualified scientific researcher, excluding the "manual" counting of the number of blastospores (microorganisms) using a microscope in the considered area of the Goryaev's chamber. To solve the problem, an automated calculation of the number of blastospores in the photographs of the calculated areas obtained using a microscope is performed. The application detects blastospores in photos using a pre-trained YOLOv5 model and data processing (counting and averaging the number of bounding boxes by groups of photos), followed by saving the calculation results to files.
- Introduction
- Features of samples preparations
- Project structure
- Instalation
- Activation the virtual environment
- Docker
- Creation exe
- Inference
- Conclusions
Entomopathogenic fungus Beauveria bassiana is a promising basis for biological insecticides for use in crop production. Its ability to cause a deadly infectious process in a wide range of insect pest species makes it possible to create biological preparations based on strains of this species that are not inferior in effectiveness to a significant number of modern chemical insecticides. In addition, biologics based on B. bassiana has a number of advantages over chemical analogues – they are much safer for humans, plants and other environmental objects, have a lower cost, and also often provide a longer protective effect, due to the ability to cause epidemics in insect populations with a high number of individuals (overpopulated populations).
The most important characteristic of a biopreparation based on an a priori effective strain of entomopathogen is the number of viable cells/mycelium particles/fungal spores per unit volume or mass of the preparation. This indicator is often determined both in the process of developing a biological product and in its industrial production. When using deep cultivation of B. bassiana to obtain live biomass, the main infectious particles in the final product are blastospores – yeast-like single cells. A convenient and fast (not requiring the cultivation of mushroom colonies) method for determining the number of blastospores in a product is direct counting in a counting chamber of a particular design using a microscope; this procedure is well applicable in production processes, but when conducting research on the creation of a bioinsecticide, it takes a lot of the researcher's time, especially if it is necessary to obtain the most reliable data when determining the number of blastospores.
An alternative to direct counting in real time can be automatic software counting of blastospores in photographs. The standard for obtaining data from a single measurement is the counting of particles in 16 large squares of the counting chamber. It is assumed that processing the same or twice as many photos with software tools will allow obtaining reliable data in a much shorter time and with minimal direct participation of the researcher.
The purpose of this project is to create an application for calculating the number of blastospores in photographs, followed by obtaining averaged values of the number of blastospores for the necessary groups of photos. The application uses a locally downloaded repository of the github project [YOLOv5](https://github.com/ultralytics/yolov5) from the developer company [Ultralytics](https://ultralytics.com). A pre-trained YOLOv5 model is used to detect blastospores. A detailed description of the steps for training the model is given in the second part of the [final project](https://github.com/ostrebko/skf_final_project), which was performed during the course "Specialization DataScience" in the online school [SkillFactory](https://skillfactory.ru).
Description
This section does not relate to the project, but gives a general understanding of how the preparation of samples and the calculation of the number of blastospores for each sample is carried out.
- Samples for calculations are taken from the bioreactor sampler using a special dispenser and then placed in a special test tube.
- Before counting, the sample is diluted in a penicillin solution and a fixed dilution coefficient.
- The diluted sample is placed in the Goryaev chamber
- Goryaev's camera is installed on the microscope slide and the area needed for counting is adjusted. An example of the transition between areas can be seen in this video.
- The number of blastospores is calculated for 10 different estimated areas of the Goryaev chamber, and then the average number is calculated.
- According to the calculated average value of the number of blastospores, the number of microorganisms in the calculated area is estimated, taking into account the dilution coefficient.
Display project structure
calc_blastos
├── config
│ └── data_config.json ## congiguration file
├── image_folder
│ ├── images_to_predict ## folder for images to detection (put folders with photos)
│ └── predicted_images ## folder with detection results (photos, reports)
├── model
│ ├── weights_1476_150_ep.pt ## trained model 1
│ └── weights_1476_450_ep.pt ## trained model 2
├── utilits ## folder with custom functions and classes
│ ├── __ init __.py
│ ├── calcs_boxes.py
│ ├── functions.py
│ ├── model_loader.py
│ └── read_config.py
├── yolov5 ## folder with yolov5 app from ultralitics git
├── Dockerfile
├── main.exe ## file to run project in windows (without python & docker)
├── main.py
├── README.md
└── requirements.txt
Display how to install app
This section provides a sequence of steps for installing and launching the application.
# 1. Clone repository
git clone https://github.com/ostrebko/calc_blastos.git
# 2. Go to the new directory:
cd calc_blastos
# 3. Activate the virtual environment in which you plan to launch the application (we will use VsCode)
# 4. Install requirements:
pip install -r requirements.txt
# 5. Place folders with groups of photos in the 'image_folder\images_to_predict' folder. To name folders, use only Latin letters, numbers (digits) and "_" instead of spaces.
# 6. Create predicts of detection blastospores with main.py or create & run main.exe (in windows).
python main.py
The description of how to activate the virtual environment was taken from Ruslan Kayumov.
Type in the console:
# Steps to activate the virtual environment in which you plan to launch the application in VsCode:
# 1. Run VS Code as an administrator, go to the project directory in PowerShell, execute the code below, the env folder containing the virtual environment files will appear
python -m venv .venv
# 2. To change the policy, in PowerShell type
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
# 3. Enter the environment folder (env), run the command
.venv/Scripts/Activate.ps1
# 4a. An environment marker (env) will appear at the beginning of the line in PowerShell, but VS Code may still not know anything about it. Press Ctrl+Shift+P, type Python: Select Interpreter
# Specify the desired path to python.exe in the env environment folder, this will be displayed at the bottom of the status bar. Now you can install modules only for a specific project.
# 4b. For VSCode, your Jupyter kernel is not necessarily using the same python interpreter you're using at the command line but if you have special libs you may need to using your notebook in created virtual environment.
# For using your notebook in created virtual environment install ipykernel:
pip install ipykernel
# then tap Ctrl+Shift+P to open the Command Palette, and select "Notebook: Select Notebook Kernel" ->
# -> Select another kernel -> Python Environments -> choose the interpreter you're using at the terminal (we create virtual environment with name: .venv)
# 5. If you need to exit, then execute deactivate in PowerShell, and return to global in the interpreter selection.
Display how to create and run docker image
# 1. Create a new image (its size is approximately 5.2 Gb)
docker build -t calc_blastos .
# 2. Run image in container.
docker run --rm -v $PWD/image_folder/:/image_folder --name calc_blastos calc_blastos
# 3. In the project directory '/image_folder/predicted_images' will appear
# a new file 'results.csv'
# 4. The created container will be automatically deleted
# after executing a sequence of commands from the Dockerfile.
# Delete the container and image after usage
docker rmi calc_blastos
Display how to create exe-file
Creating executable .exe file to run the application may be necessary in some cases. For example, if Docker and/or Python are not installed on the computer, the user does not have the minimum skills to install and configure the necessary programs and libraries, or it is impossible to prepare the computer accordingly beforehand (when demonstrating the program on the Director's or Customer's computer).
To create executable .exe file we will use: PyInstaller and the convenient GUI add-in auto-py-to-exe.
To create executable .exe file type in the console:
# 1. Go to the project application and аctivate the virtual environment
# (see section Introduction)
# 2. Install the PyInstaller package
pip install pyinstaller
# 3. Install the auto-py-to-exe package
pip install auto-py-to-exe
# 4. Run the auto-py-to-exe installed app
auto-py-to-exe
# 5. In the auto-py-to-exe console window select the parameters:
# 5.1 Script Location: Specify the full path to the file main.py
# 5.2 Onefile (--onedir / --onefile): onefile
# 5.3 Console Window (--console / --windowed) (to see the work of program): Console Based
# 5.4 In Advanced --hidden-import add (set plus three times and add one name of the following libs to each line): 1. cv2 2. yaml 3. seaborn.
# 5.5 Settings (auto-py-to-exe Specific Options): Specify the full path to the directory of main.py
# 5.6 The other parameters leave unchanged.
# 6. You can only use the pyinstaller package without installing auto-pytoexe.
# To do this, after step 2 in the command line,
# specifying the correct path to the project "C:/Full/Path/to/main.py ", run:
pyinstaller --noconfirm --onefile --console --hidden-import "cv2" --hidden-import "yaml" --hidden-import "seaborn" "C:/Full/Path/to/main.py"
General description
The term inference in this project means detecting blastospores in photographs using the YOLOv5 library and saving them with marked bounding boxes. For the purposes of the project is carried out:
- counting the number of blastospores in each photo;
- generating reports for each group of photos (usually this number is 10, but it can be any other number);
- formation of a single (general) report on all groups of photos.
The reports include data on the calculated number of blastospores for each photo, the recalculated (reduced) number of blastospores (see explanations below) and the averaged calculated values of blastospores for each group of photos.
The calculation of the number of blastospores "manually" is carried out inside the calculation grid of the Goryaev chamber, excluding neighboring areas, while the detection of blastospores using Yolov5 is carried out from a photograph, affecting the location of the area. To estimate the number of blastospores in the calculated area using a trained model, it was decided to use a decreasing coefficient equal to the ratio of the grid area to the photo area. This term became possible due to the constancy of the following indicators: 1. Relatively uniform distribution of blastospores in the photo; 2. Fixed magnification coefficient of the microscope during photographing; 3. Accurate and fixed dimensions of the Goryaev camera grid; 4. Using one (constant) resolution of photographs for training the model and for further inference.
To solve the current problem, preliminary or additional allocation of the calculated area (the area inside the Goryaev camera grid) wasn't carried out in the photographs. Why this became possible and why a more conservative approach was chosen is described in more detail in the Conclusion section
How to & what's where
To carry out an inference, each calculated group of photos must be placed in its own separate folder in *'image_folder/images_to_predict'*. To assign names to folders, you need to use only Latin letters, numbers (digits) and "_" instead of spaces.
In the *'model'* folder there are two already pre-trained YOLOv5 models. You can put another custom YOLOv5-trained models in the *'model'* folder, in this case the variable *'model_name'* in the configuration file *data_config.json* needs to be changed to the corresponding model name. A detailed description of YOLOv5 model training and information on data markup are given in Section 2 of the [final project]('https://github.com/ostrebko/skf_final_project/blob/main/part_2_model_training/1_Models_descriptions.md') from my study on course 'Specialization DataScience'.
To carry out an inference perform in the terminal: ```Python python main.py ``` or create & run main.exe in windows (see section 'Create exe').
Photos with calculated bounding boxes and reports for each group of photos are saved in the folder *'image_folder/predicted_images'* in separate folders whose names correspond to folders from *'image_folder/images_to_predict'*. The final report for all groups of photos is created in a file *'image_folder/predicted_images/results.xlsx'*.
Add conclusions what be produced after creating project
The models given in this project show a good quality of blastospore detection: according to the mAP50 metric - 0.962, and according to the mAP50 metric-95: 0.463. More careful selection of parameters and longer training of models can improve these metrics, but their publication in this project is not planned, since the results of the work will be used for a specific production. It should also be noted that the resulting quality of the models still does not allow for automatic data generation, i.e. increase the amount of marked-up data in automatic mode to improve the prediction quality (the quality of the mAP50-95 metric should be at least 0.8).
It should also be noted that the applied estimate of the average number of blastospores in the considered areas of photographs is quite conservative, and it would be necessary to determine the boundaries of the marking grid to more accurately determine the number of blastospores in the areas of interest, but this task is more difficult to implement, and leads us to:
- to capture neighboring areas when determining the boundaries of the grid: the grid is often rotated by a small angle to the horizontal, and bounding boxes does not have a rotation parameter, so adding blastospores to the current layout of the markup for the grid can still give an overestimation in number due to the areas adjacent to the grid;
- to the need to solve the joint task of segmenting the grid and detecting blastospores with two models with post-processing of prediction results (re-marking data for segmenting blastospores is time-consuming and may take 2-3 months, which is quite a lot in terms of resources spent);
- or to find another way to find the boundaries of the grid and highlight the area under consideration without using machine learning models, for example, with using the opencv library.
In defense of the chosen method of assessment, it is worth noting that averaging over groups of photographs during the growth of microorganisms in daily calculations gave deviations of up to 6-7% on average (in some cases up to 10%) compared with "manual" counting within the boundaries of the marking grid of the Goryaev camera. We assume that the researcher with long-term "manual" calculations will also give an error of at least 5% compared to the ideal case, which is already comparable to the quality of the model.
However, the most important comments were given to me by the head of the laboratory when discussing the above issue:
- the possible daily growth of microorganisms in the bioreactor gives at least 50% with well-chosen experimental parameters and significantly exceeds the resulting calculation error;
- when introduced into industrial production, the best growth parameters of microorganisms obtained in the microlaboratory are used, which means that during production, the growth of world organisms in industrial biorectors is also at least 50%.
In connection with the above, it was decided to start working with the described estimated (conservative) calculations.
I think that the comments above should have been added in connection with possible criticism of the chosen method of calculation. The work on preliminary determination of the grid boundaries will be carried out in the future if the accuracy of the selected calculation method becomes insufficient.