Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
gold_recovery_project.ipynb		gold_recovery_project.ipynb

README.md

Gold Recovery Prediction[Real project]

The company that provided the project engages in developing solutions to achieve the efficiency of industrial enterprises. As data scientists, we have been tasked with developing a prototype of a machine learning model that will be able to predict the recovery coefficient of gold from gold-bearing ore using data on the parameters of extraction and purification. The model will help to optimize production so as not to launch an enterprise with unprofitable characteristics.

Objectives

Developing the most optimal model for predicting gold recovery rate.
Achieving the lowest values of the sMAPE metric.
Optimization of production by avoiding its launch in a loss-making state.

Project progress

In order to complete the project, an access to raw data is provided, the quality of which is unknown, which will require data validation for correctness and their subsequent preprocessing (if necessary). As soon as the data has been prepared, an exploratory data analysis will be necessary for a deeper understanding of the data provided. Next, a number of regression machine learning models will be trained, from which, based on cross-validation and the values of the target metric sMAPE, the best one will be selected and later tested.

Thus, the results of the study will be obtained as a result of completing the following steps:

Loading and preparing data.
Exploratory data analysis.
Training and validating ML models.
Testing the final model.

Data

Data provided for the project are stored in the following three csv-files:

gold_recovery_train_new.csv => Training data.
gold_recovery_test_new.csv => Testing data.
gold_recovery_full_new.csv => Full data.

Machine learning models

LinearRegression
DecisionTreeRegressor
RandomForestRegressor

Note

The data are indexed by the date and time of obtaining the information (date feature), as a result of which the neighboring parameters are often similar. Some parameters are not available because they are measured and/or calculated much later. Because of this, some features that may be in the training set are missing in the test set. Besides, there are no target features in the test set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gold_recovery

gold_recovery

README.md

Gold Recovery Prediction[Real project]

Objectives

Project progress

Data

Machine learning models

Note

Files

gold_recovery

Directory actions

More options

Directory actions

More options

Latest commit

History

gold_recovery

Folders and files

parent directory

README.md

Gold Recovery Prediction[Real project]

Objectives

Project progress

Data

Machine learning models

Note