Click here to view the HTML report
All the scripts used for bioinformatics can be viewed in the HTML report, or in the bioinformatics scripts
directory.
All the R scripts for our statistics and plots can be viewed in the HTML report, or in analysis/statistical-analysis.Rmd
.
We created a sqlite3 database (data/queen_pheromone.db
) using the script import_data.sh
, which was then analysed in R. The component spreadsheets (tables) of this database are also available as .csv files in data/component spreadsheets of queen_pheromone.db
if you prefer to access them that way.
The spreadsheet has the following tables:
-
am2bt, am2lf, am2ln, bt2am, bt2lf, bt2ln, lf2am, lf2bt, lf2ln, ln2am, ln2bt, ln2lf: These tables give the BLAST hits of species1 genes in species2, with the corresponding evalue. For example, am2bt BLASTs Apis genes against Bombus. Can be used to identify genes that are each other's reciprocal best BLAST hit.
-
bee_go, bee_kegg Lists the GO nad KEGG terms associated with each Apis gene. See the file
Script to set up for GO analyses.R
, which created the bee_kegg file from data on Entrez. -
ebseq_gene_am, ebseq_gene_bt, ebseq_gene_lf, ebseq_gene_ln Output of EB-seq analysis of gene-level expression data for each species. The PostFC is probably what you want. This list includes all the genes, significant or non-significant.
-
ebseq_padj_gene_am, ebseq_padj_gene_bt, ebseq_padj_gene_lf, ebseq_padj_gene_ln Same, except that the list of genes has been culled to only include those showing significant response to treatment at p < 0.05, after Benjamini-Hochberg FDR correction.
-
ebseq_padj_isoform_am, ebseq_padj_isoform_bt, ebseq_padj_isoform_lf, ebseq_padj_isoform_ln Output of EB-seq analysis of isoform-level expression data for each species. The PostFC is probably what you want. This list only includes isoforms showing a significant response to treatment at p < 0.05, after Benjamini-Hochberg FDR correction.
-
isoforms_am, isoforms_bt, isoforms_lf, isoforms_ln List of mappings of isoforms to genes for each species.
-
rsem_am, rsem_bt, rsem_lf, rsem_ln Gene expression values (RSEM) for each gene in each species.
-
treatments Gives the treament, colony and species for each RNA-seq library.
This directory contains files kindly shared with us by other research groups, all of which were used to compare our Apis gene-level variables with other, previously-measured variables such as DNA methylation.
-
apis_gene_methyl_CG_OE.csv
CpG and gene body methylation data provided by Soojin Yi and Xin Wu. The data are from Galbraith et al. 2016 PNAS. -
harpur_etal_gamma.txt
Gamma values (i.e. strength of positive selection) from Harpur et al. 2014 PNAS. Provided by Brock Harpur. -
Amel_AllData_012709.txt
Various gene-level metrics provided by Brendan Hunt. This is the source of the queen vs sterile/reproductive worker data (measured in Grozinger et al. 2007 Mol. Ecol.), as well as the Codon Adaptation Index. -
am.gene_info.txt
Contains mappings between Entrez IDs, old Beebase IDs, and new Beebase IDs. Needed to relate different gene lists to each other.
These were made by Luke following instructions at http://bioconductor.org/packages/2.11/bioc/vignettes/GOstats/inst/doc/GOstatsForUnsupportedOrganisms.pdf. One file is for GO, one for KEGG. Needed for GSEA in the GOstats package.
gene_set_collection.RData
gene_set_collection_kegg.RData
-
Morandin to Holman orthology.csv
Orthology for placing our genes into the OGGs in Morandin et al. Genome Biology and vice versa. -
Morandin module membership.csv
Gives the module membership for each OGG in Morandin et al. Genome Biology. -
Morandin module membership.csv
Gives the caste bias of each module in Morandin et al. Genome Biology.
-
Script to set up for GO analyses.R
Used to get the KEGG terms associated with each Apis gene. -
orthodb.py
A Python script to access orthodb and get gene orthology information.