Skip to content

Multiple Allelic Comparison to Identify Candidate genes

Notifications You must be signed in to change notification settings

wkgardner/MACIC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

35 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Docker Image CI

                    β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
                    β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•— β–ˆβ–ˆβ•”β•β•β•β•β• β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β•β•β•β•
                    β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘      β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘     
                    β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘      β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘     
                    β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
                    β•šβ•β•     β•šβ•β• β•šβ•β•  β•šβ•β•  β•šβ•β•β•β•β•β• β•šβ•β•  β•šβ•β•β•β•β•β•                  

MACIC_image

Multiple Allelic Comparison to Identify Candidates (MACIC) is a Bioinformatic workflow designed with the express goal of running a GWAS (Genome Wide Association Study) on large VCF files from the same species to identify statistically significant SNPs. MACIC was built using two great open source tools, PLINK 2.0 and bcftools, and uses Python to generate plots and assemble reports.

This program requires 4 input files to run:

  1. A VCF file containing your samples of interest (Must be bgzipped).
  2. A PLINK phenotype .txt file to annotate sample phenotypes.
  • Phenotype file is a tab separated .txt file

     Animal_ID  Animal_ID_with_family  Phenotype
     Animal_ID_1  Animal_ID_1(same)  1
    
  • Phenotype values: 1 = control, 2 = case, 0 = missing phenotype

  1. The genbank annotation file (.gff) corresponding to the species.
  2. A handmade .csv file that lists the Accession numbers and chromosomes contained within the .gff

This project is still a work in progress and is very rough around the edges but it is working as of 29Mar2024.

Some people have asked, "Is it Macaque or Macic?" To which I have only one answer, it is Magic!

Assumptions made by this program:

  1. Your input vcf file has standardized chromosome numbers i.e. chr1, 1, 01 and your accession_to_chromosome.txt file corresponds with this numbering convention.
  2. Sample's sex is often hard to deduce using PLINK and unless known by other means (study notes) should be ignored. If you want to include sex as a confounder I would suggest referring to the plink documentation and adding sex labels using their built in method, then removing the "--allow-no-sex" flag from the bash script.

About

Multiple Allelic Comparison to Identify Candidate genes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •