Output structure

Overview

TOGA2 (v2.0.3) has the following output structure (for brevity’s sake, contents of nextflow/ and tmp/ directories are not displayed):

├── codon_aln.fa.gz
├── exon_aln.fa.gz
├── exon_seqs.2bit
├── inactivating_mutations.tsv
├── logs
│   ├── failed_batches_${run_id}.tsv
│   ├── project_args_${run_id}.tsv
│   └── TOGA2_${run_id}.log
├── loss_summary.tsv
├── meta
│   ├── discarded_overextended_projections.txt
│   ├── discarded_projections.bed
│   ├── exon_meta.tsv.gz
│   ├── fragmented_projections.tsv
│   ├── gained_intron_summary.tsv
│   ├── gene_tree_job_summary.tsv
│   ├── loss_summary_extended.tsv
│   ├── memory_requirements.tsv
│   ├── one2zero_genes.txt
│   ├── paralogous_projections_to_align.tsv
│   ├── processed_pseudogene_projections_to_align.tsv
│   ├── projection_features.tsv
│   ├── query_annotation.with_discarded_exons.bed
│   ├── redundant_paralogs.txt
│   ├── redundant_processed_pseudogenes.txt
│   ├── rejected_at_orthology_step.tsv
│   ├── resolved_tree_pairs.tsv
│   ├── selenocysteine_codons.tsv
│   ├── spanning_chain_ref_coords.tsv
│   ├── splice_site_shifts.tsv
│   ├── splice_sites.tsv.gz
│   ├── transcript_meta.tsv.gz
│   ├── trans_to_chain_classes.tsv
│   └── unresolved_tree_clades.txt
├── nucleotide.fa.gz
├── orthology_classification.tsv
├── orthology_scores.tsv
├── processed_pseudogenes.bed
├── protein_aln.fa.gz
├── protein.fa.gz
├── query_annotation.bed
├── query_annotation.gtf.gz
├── query_annotation.with_utrs.bed
├── query_genes.bed
├── query_genes.tsv
├── rejected_items.tsv
├── toga.table.gz
└── ucsc_browser_files
    ├── ${ucsc_prefix}.bb
    ├── ${ucsc_prefix}.decorator.bb
    ├── ${ucsc_prefix}.ix
    └── ${ucsc_prefix}.ixx

, where

${run_id} is the internal TOGA2 run’s name; by default, TOGA2 uses the following naming template: TOGA2_${hours}:${minutes}${day}.${month}.${year}${10-digit hex code};
${ucsc_prefix} is a prefix set by the user to be used in the UCSC Genome Browser names; current default value is HLTOGAannot.

General comments

All table format files (.tsv) have consistent number of tab-separated columns.
All table format files (.tsv) also have headers; if you spot an output table missing any header, please contact TOGA2 team.
Except for logs/ contents and summary.txt file, .txt format is reserved for single-column projections lists.
Heavy files (FASTA sequence files and certain tables) are compressed into gzip format (compression level 5) at the finalize step of TOGA2 pipeline.
All general file formats used correspond to their standard description:

Exploring the directory structure

Top level files: primary TOGA2 output files, put at the base of the output directory for your convenience;
logs: run-specific metadata, including initial settings, failed parallel process batches, and extensive runtime logging;
meta: second-line TOGA2 output, including certain provisional files and more specialised pieces of TOGA2 output;
nextflow: parallel process executor scripts, configuration files, and execution logs;
tmp: temporary files directory; keep those only if you plan on restarting/resuming your TOGA2 run in the nearest future;
ucsc_browser_files: BigBed files and respective indices for UCSC browser reports

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output structure

Overview

General comments

Exploring the directory structure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Contents

Clone this wiki locally