Skip to content

A comprehensive analysis of gene expression data using machine learning techniques in Python and R, focusing on predictive modeling and data visualization

Notifications You must be signed in to change notification settings

mojo8787/Comparative_ML_Analysis_Bioinformatics

Repository files navigation

Comparative Machine Learning Analysis in Bioinformatics

Introduction

This project focuses on comparative machine learning analysis in the field of bioinformatics, specifically examining gene expression data. The analysis involves various machine learning techniques, including Random Forest, Support Vector Regression (SVR), and other regression models, to predict and analyze gene expression scores.

Technologies and Libraries Used

  • Python: Used for data preprocessing, model building, and evaluation.
    • Key Libraries: pandas, numpy, sklearn, seaborn, matplotlib
  • R: Employed for statistical analysis and visualization.
    • Key Libraries: tidyverse, caret, e1071, rpart, randomForest, ggplot2, readr, ggpubr

Data Description

The project uses preprocessed gene expression data, including various features and a target variable (score). The data is analyzed to understand the relationships between different genes and their expression levels.

Machine Learning Models and Techniques

  • Random Forest Regression (Python): Used for hyperparameter tuning and model fitting.
  • Support Vector Regression (SVR) (Python & R): Applied for modeling gene expression data with linear kernel.
  • Feature Selection and Analysis: Mutual Information, Recursive Feature Elimination (RFE), and Correlation Analysis.
  • Model Evaluation: Using Mean Squared Error (MSE) and Root Mean Squared Error (RMSE).
  • Baseline Comparison: Comparison with a dummy regressor to establish baseline performance.

Visualizations

Key visualizations from the analysis are presented below:

Line Plot

Line Plot Description of the line plot.

Scatter Plot

Scatter Plot Explanation of the scatter plot findings.

Bar Chart

Bar Chart Details about the data shown in the bar chart.

Density Plot

Density Plot Interpretation of the density plot.

Joint Density Plot

Joint Density Plot Insights from the joint density plot.

Results and Discussion

Summary of key findings, including feature importance, model performance comparison, and visualization insights.

Installation and Setup

Instructions on setting up the environment and running the scripts.

Usage

Details on how to run the scripts and utilize the analysis.

Contributing

Information on how others can contribute to the project.

Contact

For more information or inquiries, please contact motasem.youniss@gmail.com.


About

A comprehensive analysis of gene expression data using machine learning techniques in Python and R, focusing on predictive modeling and data visualization

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published