This repo summaries the main bioinformatics datasources and tools.
1, ArrayExpress(EBI:European Bioinformatics Institute)-- https://www.ebi.ac.uk/arrayexpress/
stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community
2, GEO(Gene Expression Omnibus)-NCBI -- https://www.ncbi.nlm.nih.gov/geo/
stores curated gene expression DataSets, original Series and Platform records
3, ENCODE(Encyclopedia of DNA Elements)-- https://www.encodeproject.org/
Produces high-quality data and analyzes the data in an integrative fashion;
A comprehensive list of functional elements.
4, SRA(Sequence Read Archive) -- https://www.ncbi.nlm.nih.gov/sra
Sequencing raw data
5, Gencode -- https://www.gencodegenes.org/
identify and classify all gene features in the human and mouse genomes with high accuracy based on biological evidence
6, TCGA(The Cancer Genome Atlas) -- https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga
A few properties (genomics – inc. copy-number variation, transcriptomics and DNA methylation) have been studied for thousands of clinical samples of dozens of different tumor types (+ controls);
Access is controlled.
7, cBioPortal - https://www.cbioportal.org/
Github: https://github.com/cBioPortal/cbioportal
The cBioPortal for Cancer Genomics provides visualization, analysis, and download of large-scale cancer genomics data sets.
8, GEPIA: http://gepia.cancer-pku.cn/index.html
A interactive web server for analyzing RNA sequencing expression data of 9,736 tumors and 8,587 normal samples from the TCGA and the GTEx projects, using a standard processing pipeline.
9, ICGA(International Cancer Genome Consortium) --https://dcc.icgc.org/
Cancer genomics data sets visualization, analysis and download;
Single loci analysis of cancer;
Open to all
10, IGSR: The International Genome Sample Resource) --https://www.internationalgenome.org/
created a catalogue of common human genetic variation;
11, ProteomeXchange --http://www.proteomexchange.org/
provide mass spectrum proteomics data
1, MEGA(Molecular Evolutionary Genetics Analysis) -- https://www.megasoftware.net/
sophisticated and user-friendly software suite for analyzing DNA and protein sequence data from species and populations.
2, DNAstar https://www.dnastar.com/
workflow for genomics & transcriptomics & molecular biology & protein analysis
Multiple alignment program for amino acid or nucleotide sequences
4, PAL2NAL http://www.bork.embl.de/pal2nal/
a program that converts a multiple sequence alignment of proteins and the corresponding DNA (or mRNA) sequences into a codon alignment.
Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments[J]. Nucleic acids research, 2006, 34(suppl_2): W609-W612.
5, Clustal Omega https://www.ebi.ac.uk/Tools/msa/clustalo/
multiple sequence alignment program that uses seeded guide trees and HMM profile-profile techniques to generate alignments between three or more sequences.
Github: https://github.com/stamatak/standard-RAxML
A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies[J]. Bioinformatics, 2014, 30(9): 1312-1313.