Skip to content

Output structure

Yury V. Malovichko edited this page Nov 21, 2025 · 8 revisions

Overview

TOGA2 (v2.0.3) has the following output structure (for brevity’s sake, contents of nextflow/ and tmp/ directories are not displayed):

├── codon_aln.fa.gz
├── exon_aln.fa.gz
├── exon_seqs.2bit
├── inactivating_mutations.tsv
├── logs
│   ├── failed_batches_${run_id}.tsv
│   ├── project_args_${run_id}.tsv
│   └── TOGA2_${run_id}.log
├── loss_summary.tsv
├── meta
│   ├── discarded_overextended_projections.txt
│   ├── discarded_projections.bed
│   ├── exon_meta.tsv.gz
│   ├── fragmented_projections.tsv
│   ├── gained_intron_summary.tsv
│   ├── gene_tree_job_summary.tsv
│   ├── loss_summary_extended.tsv
│   ├── memory_requirements.tsv
│   ├── one2zero_genes.txt
│   ├── paralogous_projections_to_align.tsv
│   ├── processed_pseudogene_projections_to_align.tsv
│   ├── projection_features.tsv
│   ├── query_annotation.with_discarded_exons.bed
│   ├── redundant_paralogs.txt
│   ├── redundant_processed_pseudogenes.txt
│   ├── rejected_at_orthology_step.tsv
│   ├── resolved_tree_pairs.tsv
│   ├── selenocysteine_codons.tsv
│   ├── spanning_chain_ref_coords.tsv
│   ├── splice_site_shifts.tsv
│   ├── splice_sites.tsv.gz
│   ├── transcript_meta.tsv.gz
│   ├── trans_to_chain_classes.tsv
│   └── unresolved_tree_clades.txt
├── nucleotide.fa.gz
├── orthology_classification.tsv
├── orthology_scores.tsv
├── processed_pseudogenes.bed
├── protein_aln.fa.gz
├── protein.fa.gz
├── query_annotation.bed
├── query_annotation.gtf.gz
├── query_annotation.with_utrs.bed
├── query_genes.bed
├── query_genes.tsv
├── rejected_items.tsv
├── toga.table.gz
└── ucsc_browser_files
    ├── ${ucsc_prefix}.bb
    ├── ${ucsc_prefix}.decorator.bb
    ├── ${ucsc_prefix}.ix
    └── ${ucsc_prefix}.ixx

, where

  • ${run_id} is the internal TOGA2 run’s name; by default, TOGA2 uses the following naming template: TOGA2_${hours}:${minutes}${day}.${month}.${year}${10-digit hex code};
  • ${ucsc_prefix} is a prefix set by the user to be used in the UCSC Genome Browser names; current default value is HLTOGAannot.

General comments

Exploring the directory structure

  • Top level files: primary TOGA2 output files, put at the base of the output directory for your convenience;
  • logs: run-specific metadata, including initial settings, failed parallel process batches, and extensive runtime logging;
  • meta: second-line TOGA2 output, including certain provisional files and more specialised pieces of TOGA2 output;
  • nextflow: parallel process executor scripts, configuration files, and execution logs;
  • tmp: temporary files directory; keep those only if you plan on restarting/resuming your TOGA2 run in the nearest future;
  • ucsc_browser_files: BigBed files and respective indices for UCSC browser reports

Clone this wiki locally