OTUnify

Making temporary code name decisions reproducible for fungal DNA barcoding.

For New Collaborators

OTUnify doesn't replace temporary codes—it makes them scientifically reproducible.

When you review MycoMap BLAST results and decide "this sequence is close enough to Amanita sp. 'IN01'," that expert judgment currently exists only in your head. OTUnify captures that decision as explicit, versioned methodology that others can apply consistently. Your expertise remains essential—OTUnify just ensures it can be shared and reproduced.

What Problem Does This Solve?

Every day, mycologists reviewing sequence data make critical decisions:

"This 99.5% match is the same species as our temporary code"
"That 99.2% match is actually different" (because this clade has low variation)
"These sequences at 99.0-99.8% all belong together"

These decisions require deep expertise about fungal diversity patterns, but they're currently irreproducible. Different reviewers make different choices, and there's no record of why boundaries were drawn where they were.

How OTUnify Works

OTUnify provides a two-layer system that transforms expert decisions into explicit methodology:

Layer 1: Molecular OTUs (mOTUs) - The reproducible backbone

You provide: A set of sequences you know belong together
OTUnify finds: The most representative "anchor" sequence and calculates how similar the others are to it
Result: Cortinarius-HASH123-99.2-ADJ-0.1.fasta (a precise molecular boundary)
Anyone can now test if new sequences fall within YOUR defined boundary

Layer 2: Operational Species Concepts (OSCs) - Your polyphasic interpretation

Maps temporary codes like "Cortinarius sp. 'MI03'" to one or more mOTUs
Handles species with multiple haplotypes (one code → multiple mOTUs)
Handles regional variants (multiple codes → one mOTU)
Preserves morphological, ecological, geographic context, and expert notes
Currently implemented with basic functionality, full features in development

The Workflow You Know, Now Reproducible

# Today's workflow (manual, expert-dependent):
# 1. BLAST sequence against MycoMap
# 2. Expert reviews: "Close enough to Amanita sp. 'IN01'" 
# 3. Hope next person makes same judgment

# OTUnify-enabled workflow (automated, reproducible):
# 1. Check if sequence matches existing species concepts
$ otunify-classify new_sequence.fasta species_concepts/
  ✓ Amanita sp. 'IN01': 99.7% match (included)

# 2. Or discover it needs a new code  
$ otunify-classify novel_sequence.fasta species_concepts/
  ✗ No matches above threshold
  
# 3. Create new OTU with your expert criteria
$ otunify-create novel_sequences.fasta --vanity-prefix "Amanita"
  ✓ Created: Amanita-HASH789-99.5-ADJ-0.1.fasta
  
# 4. Document as new temporary code
$ otunify-describe amanita_oh99.yaml --provisional-name "Amanita sp. 'OH99'"

Why OTUnify?

The North American fungal community has built an impressive ecosystem around temporary code names—provisional identifiers that bridge the centuries-long gap between discovery and formal taxonomy. With >70,000 DNA-barcoded observations on iNaturalist and ~100,000 sequences in MycoMap, we've proven this system works.

The challenge: Every temporary code assignment involves expert judgment that currently can't be reproduced. When you decide a sequence belongs with "Cortinarius sp. 'MI03'," that decision relies on your knowledge of variation patterns in that clade—knowledge that's locked in your expertise.

The solution: OTUnify makes your expert decisions explicit and shareable. It's version control for your taxonomic judgment—preserving not just what you decided, but the key criteria you used to decide it, plus any notes and context you want to pass to other experts.

Key Capabilities

What You Can Do Today

✅ Codify Your Expertise: Transform "this looks right" into "sequences ≥99.5% similar belong here"
✅ Share Your Criteria: Export your OTU definitions so others apply your exact standards
✅ Document Decisions: Record who expanded or narrowed a code boundary and when
✅ Track Code Evolution: See the complete history as understanding improves
✅ Automate Matching: Let computers apply your criteria to incoming sequences

How It Helps Your Work

For Sequencing Labs: Consistent, reproducible OTU assignment across batches
For Data Review: Apply the same expert criteria across MycoMap, iNaturalist, and other platforms
For Publications: Cite exact, reproducible methods for sequence delimitation
For Future Taxonomy: Clear audit trail from temporary code to formal description
For Collaboration: Share your taxonomic expertise with other labs and projects

Design Philosophy: Open and Accessible

OTUnify is built on command-line tools and simple file formats to ensure maximum flexibility and integration potential. This design provides several key advantages:

No Centralized Dependencies: Works entirely offline with whatever data you have available. There's no requirement to connect to servers, wait for database updates, or depend on external systems. Your expertise and data remain fully under your control.

Low Barrier to Integration: Simple FASTA and YAML formats mean any tool can read and write OTUnify data. This enables an ecosystem of tools to evolve while keeping the core methodology stable and shared across the community.

Future GUI Integration: While OTUnify currently uses a command-line interface, we anticipate most users will eventually interact with it through seamless web interfaces on MycoMap.com or other platforms. These integrations will provide all the fundamental benefits—reproducible methodology, shared expertise, automated matching—without requiring command-line knowledge or GitHub familiarity.

The CLI foundation ensures the methodology remains transparent, reproducible, and accessible to computational workflows, while future GUI layers will make it accessible to everyone in the mycological community.

Installation

pip install git+https://github.com/joshuaowalker/otunify.git

For development setup, see the Development section below.

Quick Start Tutorial

Your First OTU Definition

Let's say you have sequences you know belong to "Russula sp. 'brevipes-CA01'". Here's how to make that knowledge reproducible:

# 1. Create an OTU from your clustered sequences
$ otunify-create russula_brevipes_sequences.fasta --vanity-prefix "Russula"
✓ Created: Russula-A5B7C9D1-99.2-ADJ-0.1.fasta
  Identity cutoff: 99.2% (based on your sequences)
  Algorithm: MycoBLAST-adjusted

What just happened? OTUnify:

Found the most representative "anchor" sequence from your set
Calculated how similar all your sequences are to it
Set the boundary at the furthest sequence you included (99.2%)
Created a computer-readable definition file others can use

Note: OTU filenames (like Russula-A5B7C9D1-99.2-ADJ-0.1.fasta) are designed for computers to process efficiently. For human-readable names, you'll create an OSC that maps your temporary code to this OTU.

Matching New Sequences

Now when you get new sequences, you can check if they match your definition:

# 2. Check if a new sequence matches
$ otunify-match unknown_sequence.fasta Russula-A5B7C9D1-*.fasta

Query: unknown_seq_001
  ✓ INCLUDED in Russula-A5B7C9D1 (99.6% identity)
  
Query: unknown_seq_002  
  ✗ EXCLUDED from Russula-A5B7C9D1 (98.8% identity)
  → Below 99.2% threshold - may need new temporary code

Creating a Species Concept

Document your temporary code with its full context:

# 3. Create an Operational Species Concept linking human-readable name to OTU
$ otunify-describe russula_brevipes_ca01.yaml \
    --provisional-name "Russula sp. 'brevipes-CA01'" \
    --otu "Russula-A5B7C9D1" \
    --description "Large white Russula from California oak woodlands" \
    --notes "Consistently found under Quercus agrifolia, spore print cream"

This OSC file captures your polyphasic understanding—morphology, ecology, geography—and links the human-readable temporary code to the computer-readable OTU.

Common Scenarios

Processing a New Sequencing Batch

# Check all new sequences against your species concepts
$ otunify-classify new_batch.fasta species_concepts/

# For unmatched sequences, create new OTUs
$ otunify-create unmatched_cluster1.fasta --vanity-prefix "Cortinarius"
$ otunify-create unmatched_cluster2.fasta --vanity-prefix "Inocybe"

# Then document as new temporary codes
$ otunify-describe cortinarius_or42.yaml --provisional-name "Cortinarius sp. 'OR42'"

Dealing with Noisy Sequences

# Remove obvious outliers (>95th percentile distance)
$ otunify-create messy_cluster.fasta --exclude-outliers 95

# Or just report what would be excluded
$ otunify-create messy_cluster.fasta --report-outliers 95

Setting Conservative Boundaries

# For well-studied species with known variation
$ otunify-create amanita_muscaria.fasta --identity-cutoff-override 99.5

# For poorly known groups, let the data decide
$ otunify-create unknown_cortinarius.fasta --vanity-prefix "Cortinarius"

What Makes a Good Anchor?

The "anchor" sequence is the reference point for your OTU. OTUnify automatically selects the best anchor by finding the medoid—the sequence that minimizes total distance to all others in your set. This ensures the most representative sequence becomes your reference point.

Best practice: Let OTUnify choose the anchor automatically. It will pick the sequence that best represents your entire cluster.

Validating Your OTU Library

# Check for overlapping OTUs that might be the same species
$ otunify-validate --check-overlaps simple my_otus/

# Use BLAST mode for large reference sets (>100 OTUs)
$ otunify-validate --check-overlaps blast --reference-set all_otus/ new_otus/

Use Cases

1. Temporary Code Assignment

Replace expert "close enough" decisions when reviewing MycoMap BLAST results with explicit, reproducible methodology for assigning sequences to existing temporary codes vs. creating new ones.

2. Community Sequencing Labs

Enable local DNA labs to generate consistent, comparable OTU definitions that integrate with the broader MycoMap ecosystem.

3. iNaturalist Integration

Support the >70,000 DNA-barcoded fungal observations on iNaturalist with systematic species delimitation that bridges citizen science and formal taxonomy.

4. Distributed Species Documentation

Facilitate the North American goal of documenting macrofungal biodiversity through coordinated, reproducible methods across multiple community science projects.

Implementation Status

This is an alpha release demonstrating the core concepts. Here's what's working:

✅ Fully Functional (mOTU Layer)

OTU Creation (otunify-create): Define reproducible sequence boundaries
Validation (otunify-validate): Check OTU definitions and detect overlaps
Matching (otunify-match): Compare sequences against OTU definitions
BLAST Mode: 5-50× performance improvement for large datasets

🚧 Working with Active Development (OSC Layer)

OSC Creation (otunify-describe): Document species concepts linking to mOTUs
Classification (otunify-classify): Assign sequences to temporary codes via OSCs
OSC Validation: Automatic checking of OSC-OTU relationships
Full polyphasic integration features still in development

📅 Planned

Convenience utilities (otunify-revise, otunify-merge) for managing OTU evolution
Web services for DOI minting
Integration with UNITE Species Hypotheses
Automated GitHub Actions validation
Phylogenetic borrowing for under-sampled groups

Command Reference

otunify-create

Creates OTU definitions from pre-clustered FASTA sequences.

otunify-create [OPTIONS] INPUT_FASTA [OUTPUT_FASTA]

Options:

--algorithm [STD|ADJ]: Identity calculation algorithm (default: ADJ)
--vanity-prefix TEXT: Human-readable OTU prefix (default: OTU)
--exclude-outliers N: Remove sequences beyond Nth percentile distance
--min-identity FLOAT: Remove sequences below identity threshold
--identity-cutoff-override FLOAT: Manual cutoff override (expert use)
--report-outliers N: Report sequences beyond Nth percentile distance
--min-length INTEGER: Minimum sequence length for anchor selection
--max-length INTEGER: Maximum sequence length for anchor selection
--no-ambiguity: Exclude sequences with ambiguity codes
--description TEXT: Optional description
--output-dir PATH: Output directory for generated files
--disable-reorient: Disable sequence reorientation
--verbose, -v: Enable verbose output

Examples:

# Clean Amanita cluster with outliers
otunify-create amanita_cluster.fasta --exclude-outliers 95 --min-identity 95.0

# Quality control with outlier reporting
otunify-create sequences.fasta --report-outliers 95 --vanity-prefix "Cortinarius_sp"

# Conservative clustering for publication
otunify-create sequences.fasta --algorithm ADJ --min-identity 97.0 --no-ambiguity

# Batch processing to output directory
otunify-create sequences.fasta --output-dir otus/ --vanity-prefix "Amanita_muscaria"

otunify-validate

Validates OTU definition files according to OTUnify specification.

otunify-validate [OPTIONS] PATHS...

Options:

--verbose, -v: Show all validation messages (errors, warnings, info)
--quiet, -q: Show only errors
--show-warnings, -w: Show warnings and errors (but not info messages)
--output, -o PATH: Write JSON report to file
--format [text|json]: Output format (default: text)
--fail-on-warnings: Exit with error code if warnings are found
--check-overlaps [none|simple|blast]: Enable overlap detection (default: none)
--overlap-factor FLOAT: Adjust overlap sensitivity (default: 1.0, >1.0 = more sensitive)
--reference-set PATH: Reference set for overlap detection (default: use validation set as reference)

Examples:

# Validate single file
otunify-validate my_otu.fasta

# Validate all FASTA files in directory (ignores auxiliary files)
otunify-validate otus/

# Show warnings and errors
otunify-validate --show-warnings otus/

# Verbose validation with JSON report
otunify-validate --verbose --output report.json --format json otus/

# Validation for CI/CD (fail on warnings)
otunify-validate --fail-on-warnings otus/

# Check for overlapping OTUs
otunify-validate --check-overlaps simple otus/

# More sensitive overlap detection
otunify-validate --check-overlaps simple --overlap-factor 1.5 otus/

# Check validation set against separate reference set
otunify-validate --check-overlaps simple --reference-set reference_otus/ new_otus/

# High-performance BLAST mode for large datasets
otunify-validate --check-overlaps blast --reference-set large_reference_db/ new_otus/

# BLAST mode with custom sensitivity
otunify-validate --check-overlaps blast --overlap-factor 1.2 --reference-set reference_otus/ validation_set/

# Validate multiple files and directories
otunify-validate file1.fasta file2.fasta directory/

Validation Levels:

File Format: FASTA parsing and basic structure
OTU ID Format: Prefix-hash structure and character validation
Metadata Validation: Required fields and format compliance
Sequence Validation: IUPAC nucleotides and length checks
Filename Convention: Consistency between filename and content
Hash Consistency: Verify calculated hash matches declared hash
Overlap Detection: Identify potentially overlapping OTU boundaries (optional)

Overlap Detection:

The overlap detection feature identifies OTUs that may have overlapping inclusion criteria using a 1D linear model. Each OTU defines a "footprint" extending from its anchor sequence, and overlaps are detected when these footprints intersect.

Detection Modes:

Simple mode: Brute-force pairwise comparison of all OTU anchor sequences
BLAST mode: High-performance candidate selection using NCBI BLAST+ with optimized thresholds

Core Features:

Reference set support: Compare validation OTUs against separate reference set instead of self-comparison
1D linear analysis: Maps OTUs to line segments for intuitive geometric overlap calculation
Coverage analysis: Shows what percentage of each OTU's footprint overlaps with the other
Special case detection: Identifies identical anchors and complete containment scenarios
Mixed algorithms: Warns when OTUs use different identity algorithms (STD vs ADJ)
Overlap factor: Adjusts sensitivity (1.0 = mathematical threshold, >1.0 = more sensitive)
Algorithm compatibility: Compares OTUs within same major algorithm version

BLAST Mode Optimizations:

Batch processing: Single BLAST query for all validation OTUs (dramatically reduces overhead)
Multi-threading: Automatic detection and use of available CPU cores
Dynamic thresholds: Intelligent BLAST identity thresholds based on OTU cutoffs
Algorithm-aware safety factors: 4x buffer for ADJ algorithm, 2x for STD algorithm
Hash collision detection: Validates sequence identity during deduplication
Database caching: Reuses BLAST databases across multiple validation runs
Performance scaling: 5-10x reduction in candidates for large reference sets

Reference Set Usage:

By default, overlap detection compares OTUs within the validation set (O(n²) comparisons). When --reference-set is specified, validation OTUs are compared against the reference set instead (O(m×n) comparisons), where:

Validation set: OTUs being validated (from PATHS arguments)
Reference set: Trusted OTUs to check against (from --reference-set)
Use cases:
- Check new OTUs against established database
- Validate subset of large collection against full dataset
- Quality control with known-good reference OTUs

Performance Comparison:

Dataset Size	Simple Mode	BLAST Mode	Improvement
30 validation × 30 reference	900 comparisons	~180 comparisons	5× reduction
609 validation × 1,103 reference	671,727 comparisons	~13,107 comparisons	51× reduction
Execution time	O(m×n) linear	O(m×avg_candidates)	5-50× faster

BLAST mode provides significant performance benefits for large reference sets while maintaining identical overlap detection accuracy.

Example overlap detection output:

[WARNING] OTU:OTU_A-HASH123456 vs OTU_B-HASH789012: Partial overlap: anchor distance 0.5% 
  (thresholds: A≥99.5%, B≥99.0%) Coverage: A≈45%, B≈30%

[WARNING] OTU:OTU_C-HASH345678 vs OTU_D-HASH901234: Complete containment (D in C): anchor distance 0.2%
  (thresholds: C≥99.0%, D≥99.8%) Coverage: C≈15%, D≈100%

otunify-match

Matches query sequences against OTU definitions to determine inclusion/exclusion.

otunify-match [OPTIONS] QUERY_FASTA REFERENCE_PATHS...

Key Options:

--mode [simple|blast]: Performance mode (use blast for large reference sets)
--nearby-distance FLOAT: Find matches within additional distance
--output-format [table|tsv|json]: Output format

otunify-classify

Classifies sequences to Operational Species Concepts (temporary codes).

otunify-classify [OPTIONS] QUERIES PATHS...

Key Options:

--max-classifications INTEGER: Show alternative classifications
--show-otu-details: Include underlying OTU match details
--mode [simple|blast]: Performance mode

otunify-describe

Creates OSC definitions linking temporary codes to OTUs.

otunify-describe [OPTIONS] OUTPUT_FILE

Key Options:

--provisional-name TEXT: The temporary code (e.g., "Russula sp. 'brevipes-CA01'")
--otu TEXT: OTU reference(s) to include
--description TEXT: Human-readable description
--notes TEXT: Additional expert observations
--interactive: Interactive mode for guided creation

Output Files

OTU Definition File

Named automatically as: {prefix}-{hash}-{cutoff}-{algorithm}-{version}.fasta

Example: Galerina_marginata-KMQ537FKVP-99.1-ADJ-0.1.fasta

Auxiliary Files

Excluded Sequences: When sequences are filtered by outlier removal, excluded sequences are written to:

excluded.{otu-filename}.fasta

Reoriented Sequences: When sequences are reverse-complemented during processing, the original orientations are saved to:

reoriented.{otu-filename}.fasta

These auxiliary files are automatically ignored during directory validation with otunify-validate but can still be validated directly if specified.

Algorithms

Standard Identity (STD)

Uses traditional percent identity calculation via the adjusted-identity library with no corrections.

MycoBLAST Adjusted Identity (ADJ)

Applies MycoBLAST-style adjustments for:

Homopolymer length normalization
IUPAC ambiguity code handling
Repeat motif adjustment
End trimming for sequencing artifacts

File Format

OTU definitions use FASTA format with structured headers:

>OTU-KMQ537FKVP Example description identity_cutoff=99.10 identity_algorithm=ADJ-0.1 format=OTUnify-0.1
ATGCGTACGATC...

See FORMAT_SPECIFICATION.md for complete technical details.

Scientific Background

The Temporary Code Problem

North American citizen science fungal barcoding has evolved a sophisticated system using "temporary code names" (Russell, 2025) to handle the gap between DNA sequence data and formal taxonomy:

Current process: MycoMap BLAST search → Human review → Expert decision about "close enough"
Scale challenge: >10,000 putative species with temporary codes, >70,000 barcoded observations
Consistency issues: Different reviewers make different decisions about sequence inclusion
Documentation gap: No record of why specific identity thresholds were chosen

The North American Ecosystem

Current Infrastructure:

MycoMap.com: Central database with ~100,000 validated sequences and temporary codes, includes BLAST search functionality
iNaturalist: >70,000 DNA-barcoded fungal observations (1% of all US fungal observations)
Community labs: Local DNA sequencing facilities supporting citizen science
Temporary codes: Can be created instantly by labeling new sequences

Temporary Code System:

Polyphasic delimitations: DNA + morphology + ecology + geography + phenology
Dynamic clustering: Based on barcode gaps, not arbitrary thresholds
Examples: Amanita "sp-IN01", Hygrocybe sp. 'conica-MI03'
Goal: Bridge data collection and formal taxonomy

The OTUnify Solution

OTUnify provides a two-layer system that transforms subjective temporary code decisions into explicit, reproducible methodology:

Layer 1: Molecular OTUs (mOTUs)

Formal circumscription: Precise sequence-based boundaries using anchor sequences and identity thresholds
Content-addressable: Hash-based identifiers ensure reproducible results
Conservative defaults: Start with tight boundaries, require explicit decisions to expand
Current implementation: Available now via otunify-create

Layer 2: Operational Species Concepts (OSCs) (Planned)

Polyphasic integration: Map temporary codes/binomials to one or more mOTUs
Many-to-many relationships: Handle multiple barcode regions, haplotype variants, cryptic species
Structured metadata: Geographic distribution, synonyms, ecological data
Relationship types:
- discriminated-by: {mOTU} - Enables automatic temporary code assignment
- indicated-by: {mOTU} - Suggests species concept as strong candidate match

This separation allows the molecular layer (formal circumscription) to remain stable while the species concept layer (empirical choices about useful taxonomic units) can evolve with new evidence and community consensus.

Relationship to Global Systems

While global databases like UNITE serve the academic community with Species Hypotheses for formal identification, OTUnify serves the North American citizen science community's immediate need for reproducible temporary code methodology. These systems address different scales, user communities, and operational requirements, with potential for future integration as the field matures.

Development

Setup

git clone https://github.com/joshuaowalker/otunify.git
cd otunify
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e ".[dev]"

Installing Optional Dependencies

For high-performance BLAST mode:

# macOS
brew install blast

# Ubuntu/Debian
sudo apt-get install ncbi-blast+

# Windows: Download from NCBI BLAST+ website

Running Tests

# Test basic functionality
otunify-create test_data/sample_sequences.fasta output.fasta --vanity-prefix "TestOTU"

# Test overlap detection  
otunify-validate --check-overlaps simple test_data/

# Test BLAST mode
otunify-validate --check-overlaps blast --reference-set test_data/ validation/

Contributing

We welcome contributions! This is an alpha release focused on validating the conceptual model. We especially welcome:

Testing with your sequence data
Feedback on the temporary code workflow
Use case documentation
Bug reports and feature requests

Please submit issues and pull requests via GitHub.

Citation

If you use OTUnify in your research, please cite:

Walker, J.O. (2024). OTUnify: A version-controlled framework for fungal DNA barcode OTU management. 
GitHub: https://github.com/joshuaowalker/otunify

Please also cite the North American temporary code system and related work:

Russell, S.D., Birkebak, J., Burzynski, T., Canan, K., D'Elia, G., Geurin, Z., Hunt, B., Jacob, S., 
Mueller, G.M., Ospina, S., Ostuni, S., Peace, R., Quark, M., Reitan, A., Rockefeller, A., Singer, H., 
Walker, J., Williams, J. (2025). Approaching Full-Scale DNA Barcoding for North American Macrofungi: 
Highlights from the MycoMap Network. Inoculum 76(3):17-22. Newsletter of the Mycological Society of America.

Russell, S. (2025). Using Temporary Code Names for Documenting Macrofungi. 
Retrieved from https://mycotalab.substack.com/p/using-temporary-code-names-for-documenting

Russell, S. (2025). FAQ: Temporary Code Names for Macrofungi - How Temporary Code Names Help Us Map Fungal Biodiversity. 
Retrieved from https://mycotalab.substack.com/p/faq-temporary-code-names-for-macrofungi

When appropriate, also cite global fungal databases:

Abarenkov, K., Nilsson, R.H., Larsson, K.H., Taylor, A.F.S., May, T.W., Frøslev, T.G., Pawlowska, J., 
Lindahl, B., Põldmaa, K., Truong, C., Vu, D., Hosoya, T., Niskanen, T., Piirmann, T., Ivanov, F., 
Zirk, A., Peterson, M., Cheeke, T.E., Ishigami, Y., Jansson, A.T., Jeppesen, T.S., Kristiansson, E., 
Mikryukov, V., Miller, J.T., Oono, R., Ossandon, F.J., Paupério, J., Saar, I., Schigel, D., Suija, A., 
Tedersoo, L., Kõljalg, U. (2024). The UNITE database for molecular identification and taxonomic 
communication of fungi and other eukaryotes: sequences, taxa and classifications reconsidered. 
Nucleic Acids Research, 52(D1), D791-D797.

Support

GitHub Issues: https://github.com/joshuaowalker/otunify/issues
Discussions: https://github.com/joshuaowalker/otunify/discussions

License

BSD 2-Clause License. See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
docs		docs
src/otunify		src/otunify
templates		templates
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
FORMAT_SPECIFICATION.md		FORMAT_SPECIFICATION.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

License

joshuaowalker/otunify

Folders and files

Latest commit

History

Repository files navigation

OTUnify

For New Collaborators

What Problem Does This Solve?

How OTUnify Works

The Workflow You Know, Now Reproducible

Why OTUnify?

Key Capabilities

What You Can Do Today

How It Helps Your Work

Design Philosophy: Open and Accessible

Installation

Quick Start Tutorial

Your First OTU Definition

Matching New Sequences

Creating a Species Concept

Common Scenarios

Processing a New Sequencing Batch

Dealing with Noisy Sequences

Setting Conservative Boundaries

What Makes a Good Anchor?

Validating Your OTU Library

Use Cases

1. Temporary Code Assignment

2. Community Sequencing Labs

3. iNaturalist Integration

4. Distributed Species Documentation

Implementation Status

✅ Fully Functional (mOTU Layer)

🚧 Working with Active Development (OSC Layer)

📅 Planned

Command Reference

otunify-create

otunify-validate

otunify-match

otunify-classify

otunify-describe

Output Files

OTU Definition File

Auxiliary Files

Algorithms

Standard Identity (STD)

MycoBLAST Adjusted Identity (ADJ)

File Format

Scientific Background

The Temporary Code Problem

The North American Ecosystem

The OTUnify Solution

Relationship to Global Systems

Development

Setup

Installing Optional Dependencies

Running Tests

Contributing

Citation

Support

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages