Skip to content

Latest commit

 

History

History
63 lines (46 loc) · 2.58 KB

File metadata and controls

63 lines (46 loc) · 2.58 KB

Comparative Machine Learning Analysis in Bioinformatics

Introduction

This project focuses on comparative machine learning analysis in the field of bioinformatics, specifically examining gene expression data. The analysis involves various machine learning techniques, including Random Forest, Support Vector Regression (SVR), and other regression models, to predict and analyze gene expression scores.

Technologies and Libraries Used

  • Python: Used for data preprocessing, model building, and evaluation.
    • Key Libraries: pandas, numpy, sklearn, seaborn, matplotlib
  • R: Employed for statistical analysis and visualization.
    • Key Libraries: tidyverse, caret, e1071, rpart, randomForest, ggplot2, readr, ggpubr

Data Description

The project uses preprocessed gene expression data, including various features and a target variable (score). The data is analyzed to understand the relationships between different genes and their expression levels.

Machine Learning Models and Techniques

  • Random Forest Regression (Python): Used for hyperparameter tuning and model fitting.
  • Support Vector Regression (SVR) (Python & R): Applied for modeling gene expression data with linear kernel.
  • Feature Selection and Analysis: Mutual Information, Recursive Feature Elimination (RFE), and Correlation Analysis.
  • Model Evaluation: Using Mean Squared Error (MSE) and Root Mean Squared Error (RMSE).
  • Baseline Comparison: Comparison with a dummy regressor to establish baseline performance.

Visualizations

Key visualizations from the analysis are presented below:

Line Plot

Line Plot Description of the line plot.

Scatter Plot

Scatter Plot Explanation of the scatter plot findings.

Bar Chart

Bar Chart Details about the data shown in the bar chart.

Density Plot

Density Plot Interpretation of the density plot.

Joint Density Plot

Joint Density Plot Insights from the joint density plot.

Results and Discussion

Summary of key findings, including feature importance, model performance comparison, and visualization insights.

Installation and Setup

Instructions on setting up the environment and running the scripts.

Usage

Details on how to run the scripts and utilize the analysis.

Contributing

Information on how others can contribute to the project.

Contact

For more information or inquiries, please contact motasem.youniss@gmail.com.