-
Notifications
You must be signed in to change notification settings - Fork 1
Output structure
Yury V. Malovichko edited this page Nov 21, 2025
·
8 revisions
TOGA2 (v2.0.3) has the following output structure (for brevity’s sake, contents of nextflow/ and tmp/ directories are not displayed):
├── codon_aln.fa.gz
├── exon_aln.fa.gz
├── exon_seqs.2bit
├── inactivating_mutations.tsv
├── logs
│ ├── failed_batches_${run_id}.tsv
│ ├── project_args_${run_id}.tsv
│ └── TOGA2_${run_id}.log
├── loss_summary.tsv
├── meta
│ ├── discarded_overextended_projections.txt
│ ├── discarded_projections.bed
│ ├── exon_meta.tsv.gz
│ ├── fragmented_projections.tsv
│ ├── gained_intron_summary.tsv
│ ├── gene_tree_job_summary.tsv
│ ├── loss_summary_extended.tsv
│ ├── memory_requirements.tsv
│ ├── one2zero_genes.txt
│ ├── paralogous_projections_to_align.tsv
│ ├── processed_pseudogene_projections_to_align.tsv
│ ├── projection_features.tsv
│ ├── query_annotation.with_discarded_exons.bed
│ ├── redundant_paralogs.txt
│ ├── redundant_processed_pseudogenes.txt
│ ├── rejected_at_orthology_step.tsv
│ ├── resolved_tree_pairs.tsv
│ ├── selenocysteine_codons.tsv
│ ├── spanning_chain_ref_coords.tsv
│ ├── splice_site_shifts.tsv
│ ├── splice_sites.tsv.gz
│ ├── transcript_meta.tsv.gz
│ ├── trans_to_chain_classes.tsv
│ └── unresolved_tree_clades.txt
├── nucleotide.fa.gz
├── orthology_classification.tsv
├── orthology_scores.tsv
├── processed_pseudogenes.bed
├── protein_aln.fa.gz
├── protein.fa.gz
├── query_annotation.bed
├── query_annotation.gtf.gz
├── query_annotation.with_utrs.bed
├── query_genes.bed
├── query_genes.tsv
├── rejected_items.tsv
├── toga.table.gz
└── ucsc_browser_files
├── ${ucsc_prefix}.bb
├── ${ucsc_prefix}.decorator.bb
├── ${ucsc_prefix}.ix
└── ${ucsc_prefix}.ixx
, where
-
${run_id}is the internal TOGA2 run’s name; by default, TOGA2 uses the following naming template: TOGA2_${hours}:${minutes}${day}.${month}.${year}${10-digit hex code}; -
${ucsc_prefix}is a prefix set by the user to be used in the UCSC Genome Browser names; current default value isHLTOGAannot.
- All table format files (
.tsv) have consistent number of tab-separated columns. - All table format files (
.tsv) also have headers; if you spot an output table missing any header, please contact TOGA2 team. - Except for
logs/contents andsummary.txtfile,.txtformat is reserved for single-column projections lists. - Heavy files (FASTA sequence files and certain tables) are compressed into gzip format (compression level 5) at the
finalizestep of TOGA2 pipeline. - All general file formats used correspond to their standard description:
- Top level files: primary TOGA2 output files, put at the base of the output directory for your convenience;
- logs: run-specific metadata, including initial settings, failed parallel process batches, and extensive runtime logging;
- meta: second-line TOGA2 output, including certain provisional files and more specialised pieces of TOGA2 output;
- nextflow: parallel process executor scripts, configuration files, and execution logs;
- tmp: temporary files directory; keep those only if you plan on restarting/resuming your TOGA2 run in the nearest future;
- ucsc_browser_files: BigBed files and respective indices for UCSC browser reports