Skip to content

Latest commit

 

History

History
171 lines (124 loc) · 4.67 KB

File metadata and controls

171 lines (124 loc) · 4.67 KB

Sequencing Data Analysis Framework

This repository contains scripts and tools for processing and analyzing sequencing data from Illumina and ONT (Oxford Nanopore Technologies) platforms.

Folder Structure

0_sample_data

  • 1_data_processing
    • Sample output as .vcf of the data processing pipeline for Illumina and ONT for 10 sample patients.
  • 2_data_analyzer
    • 2024-06-21_step_1_vcf_import_exons_Illumina
    • 2024-06-21_step_1_vcf_import_exons_ONT
    • 2024-06-21_step_2_Illumina_and_ONT_merged
    • 2024-06-21_step_3_dataset_analyzed
    • 2024-06-21_step_4_dataset_color_coded
  • reference_sequence
    • amplicon_reference_sequence.fa

1_data_processing

  • install_dependencies.sh
  • sequ_data_framework_Illumina.sh
  • sequ_data_framework_ONT.sh

2_genotype_analyzer

  • GenotypeAnalyzer.exe
  • GenotypeAnalyzer.py
  • Setting_Genotype-Analyzer.xlsx
  • step_1_build_genotype.py
  • step_2_merge_files.py
  • step_3_analyse_file.py
  • step_4_color_count.py

Sequencing Data Framework Setup Guide

Operating System Requirement

  • OS required: Linux

Installation Instructions

  1. Clone the Repository:

    git clone https://github.com/ChrAtt1/Sequencing-Data-Analysis-Framework.git
    cd Sequencing-Data-Analysis-Framework
  2. Install Dependencies:

    chmod +x ./install_dependencies.sh
    ./install_dependencies.sh

    The install_dependencies.sh script will install the following:

    • Conda
    • SAMtools
    • BWA Aligner
    • WhatsHap
    • Nanofilt
    • minimap2
    • fastq-filter

    (Links for installation guides and repositories were last accessed on 23 June 2024.)

Initial Configuration

Before the first start, modify the permissions of the scripts to make them executable:

chmod +x ./sequ_data_framework_Illumina.sh
chmod +x ./sequ_data_framework_ONT.sh

Insert the variables directly into the shell scripts (sequ_data_framework_Illumina.sh and sequ_data_framework_ONT.sh). Assign your specific file paths to these variables. Example:

#!/bin/bash
# Specify the base path where the data is located
path_base_data="/path/to/your/base/data"
# Specify the path to the input data
input_data="/path/to/your/input/data"
# Provide the path to the reference sequence file (.fa-file)
path_reference_sequence="/path/to/your/reference/sequence.fa"

Replace the placeholder paths (/path/to/your/...) with the actual paths on your system.

Running the Scripts

To run the scripts, use the following commands in the terminal:

./sequ_data_framework_Illumina.sh
./sequ_data_framework_ONT.sh

Ensure you have the necessary permissions and that the paths specified in the scripts are correct before execution.

Genotype Analyzer

Genotype Analyzer is a Python-based application designed to analyze genotype data using various settings and methods. The application provides a graphical user interface (GUI) built with Tkinter, enabling users to input necessary files, configure settings, and perform genotype analysis.

Features

  • Load and parse settings from an Excel file
  • Browse and select files and folders through the GUI
  • Configure sequencing methods and amplicon settings
  • Validate input paths and settings
  • Perform genotype building, merging, and analysis
  • Display progress with a progress bar
  • Measure and print execution time for each step

Requirements

  • Python 3.8+
  • Pandas
  • Tkinter
  • Linux
  • Pycharm

Installation

Install Dependencies

Use pip to install the required libraries:

pip install pandas
pip install math
pip install collections
pip install datetime
pip install scipy
pip install numpy
pip install re
pip install shutil
pip install openpyxl
pip install tkinter

Running the Application

For Linux

python GenotypeAnalyzer.py

For Windows

Run the GenotypeAnalyzer.exe file.

Usage

  1. Load the Settings File:

    The application expects an Excel file named Setting_Genotype-Analyzer.xlsx with two sheets:

    • Sequencing Method Setting
    • Amplicon Setting
  2. Configure Settings:

    • Use the GUI to browse and select the necessary files and folders for the datasets and reference sequence as well es the additional settings.
    • Configure the sequencing methods and amplicon settings as needed.
    • Choose between analyzing VCF files (Option A) or using previous Genotype Analyzer output Excel files (Option B).
  3. Run the Analysis:

    • Click the "Analyse Genotypes" button to start the analysis process.
    • The application will validate the input paths and settings before proceeding.
    • Progress is displayed with a progress bar, and execution times for each step are printed in the console.