This repository contains the code that we used to build our machine learning models (models folder) that fed into the output (final dashboard folder), and was supplemented with additional web scraping and data visualization (data visualization folder). The main dataset for bean classification was pulled from the beans package in R, which was originally sourced from Koklu and Ozkan [1]. The repository is structured as follows.
Data Visualization Folder
-
Visualization.qmd and Webscraping_and_data_visualization.qmd
- Beans production data scraped from the web and visualization of both global and regional beans production trends.
Model Folder
-
Emily_ML.qmd
- This file contains the cross-validation for Random Forest and XGBoost, building the collective ensemble model, and bundling model objects.
-
Bella_ML.qmd
- This file contains cross-validation for Classification Tree, Lasso, PCA, and SVM Model_metrics.qmd AUC curves, PCA component visualization, accuracy by bean type and model
Final_dashboard Folder
- This directory contains the code needed to run the dashboard deploying results. The file “app.R” contains the code needed to run the dashboard. The “RData” files contain exported models for use in the dashboard.
[1] Koklu, Murat, and Ilker Ali Ozkan. ‘Multiclass classification of dry beans using computer vision and machine learning techniques.’ Computers and Electronics in Agriculture 174 (2020): 105507.