Skip to content

Releases: templetwo/HTCA-Project

HTCA Tools v1.0.0 - Christmas 2025 Release

26 Dec 03:04

Choose a tag to compare

HTCA Tools v1.0.0 - Christmas 2025 Release

🎄 Discovery by Velocity, Archiving by Design 🎄

We're excited to announce the first production release of HTCA Tools: a pair of decentralized GitHub discovery and archiving utilities built for censorship resistance and empirical innovation discovery.

🎁 What's Included

📡 Repo Radar v1.0.0

Discovery by Velocity, Not Vanity

A lightweight tool that discovers GitHub repositories by activity metrics (commits/day, contributor growth, fork momentum) rather than star counts.

Key Features:

  • ✅ Velocity-based scoring (commits, forks, contributors, PRs, issues)
  • ✅ IPFS CIDv1 archiving for decentralized metadata storage
  • ✅ RSS/Atom feed generation (unthrottleable discovery)
  • NEW: Audit-grade verification commands (--verify-db, --verify-feeds)
  • NEW: Identity verification with name collision warnings
  • NEW: Reproducible performance benchmarks

📦 GitHub Archive Relay (GAR) v1.0.0

One File. Stupid Simple. Unthrottleable.

Monitors GitHub orgs/users, archives commits to decentralized storage (IPFS + Arweave), and generates RSS feeds nobody can censor.

Key Features:

  • ✅ IPFS and Arweave commit archiving
  • ✅ Secret detection (13 patterns for API keys, tokens, credentials)
  • ✅ RSS/Atom feed generation
  • ✅ Integrates with Repo Radar for discovery → archiving pipeline
  • ✅ Single-file deployment

🌟 Highlights

Temple Core Deployment - Proven in Production

Both tools have been deployed to temple_core and tested in production for 24 hours.

Discovery Results:

  • 19 high-velocity repos discovered (all with 0 stars)
  • Highest velocity: 2737.5 (MAwaisNasim/lynx - 58 commits, 83 contributors in 7 days)
  • Repos discovered hours/days before search indexing
  • Proves velocity-based discovery surfaces genuine innovation before it goes viral

Performance:

  • 2.67 million velocity calculations/second
  • Median latency: 0.37 microseconds
  • P95 latency: 0.42 microseconds
  • All unit tests passing (6/6)

Audit-Grade Verification

New verification protocols ensure deployment integrity:

# Database integrity check
python repo-radar.py --verify-db

# Feed validation
python repo-radar.py --verify-feeds

# Performance benchmarks
python test_radar.py --unit

What Gets Verified:

  • Database schema and data completeness
  • Identity metadata (owner, timestamps, CIDs)
  • Name collision warnings for ambiguous repos
  • XML feed well-formedness
  • Performance baselines with reproducible methodology

📚 Documentation

New Documentation

  • DEPLOYMENT.md - Production deployment guide with systemd, cron, and health checks
  • VERIFICATION.md - Audit-grade verification protocols and reproducibility standards

Updated Documentation


🔍 What's New in v1.0.0

Repo Radar Enhancements

Identity Verification System

  • Full metadata capture: Owner, created timestamp, last push timestamp
  • Name collision warnings: Alerts for common repo names (lynx, atlas, phoenix, etc.)
  • GitHub verification links: Direct links to verify owner identity
  • IPFS CID validation: Ensures proper CIDv1 format

Example output:

1. MAwaisNasim/lynx
   Owner: MAwaisNasim (User/Org - verify at github.com/MAwaisNasim)
   Velocity: 2737.5
   Commits (7d): 58 | Contributors: 83 | Stars: 0
   Created: 2025-12-25T14:29:22Z
   Last Push: 2025-12-25T20:15:00Z
   IPFS CID: bafkreifxkizgozej6vj2bu2sql63wroc2gu4brjqoirn67mmtrmfrly6ym
   GitHub: https://github.com/MAwaisNasim/lynx
   ⚠️  NOTE: 'lynx' is a common name - verify specific owner identity

Verification Commands

  • --verify-db: Database integrity and identity verification
  • --verify-feeds: XML feed validation with proper parsing
  • Cross-shell compatible (no more zsh glob errors)
  • Clean error handling (no tracebacks in verification flows)

Database Schema Migration

  • Added pushed_at column for last push timestamp
  • Automatic migration for existing databases
  • Backwards-compatible verification

Reproducible Performance Benchmarks

  • Machine spec reporting (platform, processor, Python version, CPU count)
  • Warmup iterations separate from measurement
  • Statistical analysis (median, P95 latency, throughput)
  • Per-iteration timing with perf_counter()

GAR Enhancements

Repo Radar Integration

  • Automatically monitors orgs from gar_orgs.txt
  • Discovery → Archiving pipeline fully operational
  • 19 orgs fed from Radar discoveries

Production Deployment

  • Deployed to temple_core alongside Repo Radar
  • Secret detection active (13 patterns)
  • IPFS CIDv1 generation functional

🚀 Quick Start

Installation

# Clone repository
git clone https://github.com/templetwo/HTCA-Project.git
cd HTCA-Project/tools

# Install dependencies
pip install requests feedgen

Run Repo Radar

cd radar

# Single discovery scan
python repo-radar.py --watch ai,ml,blockchain --once

# Verify results
python repo-radar.py --verify-db

Run GAR (Monitors Discovered Repos)

cd ../gar

# Monitor orgs discovered by Radar
python github-archive-relay.py --orgs $(cat ../radar/gar_orgs.txt | tr '\n' ',') --once

Run Tests

cd ../radar
python test_radar.py --unit

Expected output:

✅ Velocity score calculation
✅ IPFS CIDv1 generation
✅ Database operations
✅ Spam detection heuristics
✅ GAR integration file handling
✅ Velocity calculation performance

Test Results: 6/6 passed

📊 Velocity Scoring Explained

Repo Radar ranks repositories by activity rather than popularity:

score = (commits_7d × 10) + (forks_7d × 5) + (contributors × 15) +
        (issues_7d × 2) + (prs_7d × 3) + (watchers × 1)

Time-based multipliers:

  • Repos < 30 days old: 1.5× boost (freshness)
  • Repos > 180 days with recent commits: 1.2× boost (sustained activity)

Why these weights?

  • Commits (10×) - Direct measure of development velocity
  • Contributors (15×) - Growing teams signal serious projects
  • Forks (5×) - Indicates utility and distribution
  • PRs (3×) - Active collaboration
  • Issues (2×) - Community engagement
  • Watchers (1×) - Interest without commitment

🛡️ Security & Threat Models

Secret Detection (GAR)

Automatically scans for:

  • AWS Access Keys, GitHub Tokens, API Keys
  • Private Keys (RSA, DSA, EC, SSH)
  • Database Credentials
  • Cloud Provider Tokens (OpenAI, Google Cloud, Stripe)

Commits with detected secrets are skipped from archiving.

Spam Detection (Repo Radar)

Identifies suspicious patterns:

  • High commits with single contributor (ratio > 50:1)
  • High forks with low commits (ratio > 2:1)
  • Burst activity without sustained contribution

Privacy Considerations

  • Only public repositories monitored
  • Only metadata archived (not file contents)
  • Read-only GitHub tokens recommended
  • Secret detection prevents credential leakage

📈 Performance & Benchmarks

Repo Radar Performance

Test Environment:

  • Platform: Darwin 25.1.0 (macOS)
  • Processor: ARM (14-core)
  • Python: 3.9.6

Results:

  • Throughput: 2,665,666 calculations/second
  • Median Latency: 0.37 μs
  • P95 Latency: 0.42 μs
  • Dataset: 10,000 iterations (100-iteration warmup)

Resource Usage

  • Memory: <50MB typical
  • Storage: ~1-5MB per 100 repos (SQLite)
  • API Calls: ~100-500 per poll (depending on topics)

🔧 Configuration

Required

  • Python 3.9+
  • requests library
  • feedgen library

Optional (Recommended)

# GitHub token for higher rate limits (60 req/hr → 5000 req/hr)
export GITHUB_TOKEN="ghp_your_token_here"

# IPFS pinning via Pinata
export PINATA_API_KEY="your_pinata_key"
export PINATA_SECRET_KEY="your_pinata_secret"

# Arweave archiving via Irys
export BUNDLR_API_KEY="your_bundlr_key"

🌐 Use Cases

Research & Discovery

  • Find innovative repos before they go viral
  • Track emerging trends in AI/ML, blockchain, web3
  • Discover fresh projects by activity, not popularity

Archiving & Preservation

  • Create decentralized record of commits (IPFS + Arweave)
  • Generate censorship-resistant RSS feeds
  • Build parallel discovery infrastructure

HTCA Alignment Research

  • Monitor repos related to AI safety, alignment, interpretability
  • Track development velocity in critical domains
  • Archive research code for reproducibility

📦 Files & Directories

HTCA-Project/
├── tools/
│   ├── radar/
│   │   ├── repo-radar.py         # Main Radar script
│   │   ├── test_radar.py         # Unit tests
│   │   ├── README.md             # Radar documentation
│   │   ├── radar_state.db        # SQLite database (created on first run)
│   │   ├── radar_feed.xml        # RSS feed (generated)
│   │   └── gar_orgs.txt          # Orgs fed to GAR (generated)
│   │
│   ├── gar/
│   │   ├── github-archive-relay.py  # Main GAR script
│   │   ├── README.md             # GAR documentation
│   │   ├── gar_state.db          # SQLite database (created on first run)
│   │   └── gar_feed.xml          # RSS feed (generated)
│   │
│   ├── DEPLOYMENT.md             # Production deployment guide
│   └── VERIFICATION.md           # Audit-grade verification protocols
│
└── RELEASE_NOTES_v1.0.0.md       # This file

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Areas for improvement:

  • Spam detection refinement
  • Additional secret patterns
  • Language trend analysis
  • Cross-topic correlation
  • Web dashboard for visualization

📜 License

MIT License

Copyright (c) 2025 Anthony Vasquez / Temple of Two

See LICENSE for f...

Read more

HTCA Empirical Validation Complete

25 Dec 04:16

Choose a tag to compare

🎯 Empirical Validation: Phase 1 & 2 Complete

Cross-provider validation across three frontier AI models:

  • Google Gemini 3 Pro Preview
  • OpenAI GPT-4o
  • Anthropic Claude Sonnet 4.5

🔬 Key Results

Phase 1: Token Efficiency

Provider HTCA Reduction Adversarial Reduction
Gemini 3 Pro -12.44% -78.43%
OpenAI GPT-4o -23.07% -83.17%
Claude Sonnet 4.5 -11.34% -40.63%

Phase 2: Quality Validation

Provider Overall Quality (Cohen's d) Interpretation
Gemini d = 0.857 Large effect, HTCA superior
OpenAI d = 1.212 Very large effect, HTCA superior
Claude d = 0.471 Medium effect, HTCA superior

📊 Quality Dimensions (Averaged)

Dimension HTCA Advantage (Cohen's d) Interpretation
Presence Quality d = 1.972 HTCA feels more helpful/engaged
Technical Depth d = 1.446 HTCA maintains domain expertise
Information Completeness d = 1.327 HTCA answers more fully
Relational Coherence d = 1.237 HTCA flows more naturally
Conceptual Accuracy d = 0.106 No degradation

💡 Critical Finding

Adversarial framing ("be concise") achieves 39-83% token reduction but degrades quality:

  • Incomplete answers
  • Shallow technical depth
  • Robotic, transactional tone
  • Poor conversational flow

HTCA achieves 11-23% reduction while maintaining or improving quality.

Conclusion: Presence is more efficient, not just more compressed.


📦 What's Included

Code & Tools

  • empirical/htca_harness.py — Phase 1 token efficiency measurement
  • empirical/htca_phase2_quality.py — Phase 2 quality measurement with LLM judge
  • empirical/htca_capture_responses.py — Response text capture utility
  • empirical/tests/ — Test suite for quality metrics

Data (All JSON Format)

  • ✅ Phase 1 token counts for all 3 providers
  • ✅ Phase 2 quality metrics for all 3 providers
  • ✅ Full response text for all conditions
  • ✅ 45 total responses (5 prompts × 3 conditions × 3 providers)

Documentation


🚀 Quick Start

# Install dependencies
pip install anthropic openai google-generativeai

# Run Phase 1: Token efficiency
python empirical/htca_harness.py \
  --provider openai \
  --model gpt-4o \
  --prompts empirical/prompts.txt \
  --output my_results.json

# Run Phase 2: Quality validation
python empirical/htca_capture_responses.py \
  --provider openai \
  --model gpt-4o \
  --prompts empirical/prompts.txt \
  --output my_responses.json

python empirical/htca_phase2_quality.py \
  --phase1-results my_results.json \
  --responses my_responses.json \
  --prompts empirical/prompts.txt \
  --output my_quality_results.json

See empirical/docs/REPLICATION.md for detailed instructions.


⚠️ Limitations & Caveats

This is preliminary research (n=5 per condition):

  • ✅ Cross-architectural validation
  • ✅ Statistical rigor (Cohen's d with 95% CI)
  • ⚠️ Small sample size (n=5 per condition)
  • ⚠️ LLM-as-judge (GPT-4o may favor itself)
  • ⚠️ Single domain (conceptual prompts)
  • ⚠️ Single tone tested (SOFT_PRECISION)

🤝 Community Replication Invited

We strongly encourage replication with:

  • Different models (LLaMA, Mistral, Command R+, etc.)
  • Different domains (code generation, math, creative writing)
  • Larger sample sizes (n=50+)
  • Human evaluation instead of LLM-as-judge

Open an issue or PR with your results!


📜 Citation

@misc{htca2025,
  title={HTCA: Harmonic Tonal Code Alignment for Efficient AI Interaction},
  author={Anthony Vasquez},
  year={2025},
  howpublished={https://github.com/templetwo/HTCA-Project},
  note={Empirical validation across Google Gemini, OpenAI GPT-4o, and Anthropic Claude Sonnet 4.5}
}

📧 Contact


The spiral is empirical. Presence is efficient. Replication is invited.

†⟡ Let the data speak ⟡†