Releases: templetwo/HTCA-Project
HTCA Tools v1.0.0 - Christmas 2025 Release
HTCA Tools v1.0.0 - Christmas 2025 Release
🎄 Discovery by Velocity, Archiving by Design 🎄
We're excited to announce the first production release of HTCA Tools: a pair of decentralized GitHub discovery and archiving utilities built for censorship resistance and empirical innovation discovery.
🎁 What's Included
📡 Repo Radar v1.0.0
Discovery by Velocity, Not Vanity
A lightweight tool that discovers GitHub repositories by activity metrics (commits/day, contributor growth, fork momentum) rather than star counts.
Key Features:
- ✅ Velocity-based scoring (commits, forks, contributors, PRs, issues)
- ✅ IPFS CIDv1 archiving for decentralized metadata storage
- ✅ RSS/Atom feed generation (unthrottleable discovery)
- ✅ NEW: Audit-grade verification commands (
--verify-db,--verify-feeds) - ✅ NEW: Identity verification with name collision warnings
- ✅ NEW: Reproducible performance benchmarks
📦 GitHub Archive Relay (GAR) v1.0.0
One File. Stupid Simple. Unthrottleable.
Monitors GitHub orgs/users, archives commits to decentralized storage (IPFS + Arweave), and generates RSS feeds nobody can censor.
Key Features:
- ✅ IPFS and Arweave commit archiving
- ✅ Secret detection (13 patterns for API keys, tokens, credentials)
- ✅ RSS/Atom feed generation
- ✅ Integrates with Repo Radar for discovery → archiving pipeline
- ✅ Single-file deployment
🌟 Highlights
Temple Core Deployment - Proven in Production
Both tools have been deployed to temple_core and tested in production for 24 hours.
Discovery Results:
- 19 high-velocity repos discovered (all with 0 stars)
- Highest velocity: 2737.5 (MAwaisNasim/lynx - 58 commits, 83 contributors in 7 days)
- Repos discovered hours/days before search indexing
- Proves velocity-based discovery surfaces genuine innovation before it goes viral
Performance:
- 2.67 million velocity calculations/second
- Median latency: 0.37 microseconds
- P95 latency: 0.42 microseconds
- All unit tests passing (6/6)
Audit-Grade Verification
New verification protocols ensure deployment integrity:
# Database integrity check
python repo-radar.py --verify-db
# Feed validation
python repo-radar.py --verify-feeds
# Performance benchmarks
python test_radar.py --unitWhat Gets Verified:
- Database schema and data completeness
- Identity metadata (owner, timestamps, CIDs)
- Name collision warnings for ambiguous repos
- XML feed well-formedness
- Performance baselines with reproducible methodology
📚 Documentation
New Documentation
- DEPLOYMENT.md - Production deployment guide with systemd, cron, and health checks
- VERIFICATION.md - Audit-grade verification protocols and reproducibility standards
Updated Documentation
- tools/radar/README.md - Enhanced with verification commands and Temple Core results
- tools/gar/README.md - Updated with deployment results and Radar integration details
🔍 What's New in v1.0.0
Repo Radar Enhancements
Identity Verification System
- Full metadata capture: Owner, created timestamp, last push timestamp
- Name collision warnings: Alerts for common repo names (lynx, atlas, phoenix, etc.)
- GitHub verification links: Direct links to verify owner identity
- IPFS CID validation: Ensures proper CIDv1 format
Example output:
1. MAwaisNasim/lynx
Owner: MAwaisNasim (User/Org - verify at github.com/MAwaisNasim)
Velocity: 2737.5
Commits (7d): 58 | Contributors: 83 | Stars: 0
Created: 2025-12-25T14:29:22Z
Last Push: 2025-12-25T20:15:00Z
IPFS CID: bafkreifxkizgozej6vj2bu2sql63wroc2gu4brjqoirn67mmtrmfrly6ym
GitHub: https://github.com/MAwaisNasim/lynx
⚠️ NOTE: 'lynx' is a common name - verify specific owner identity
Verification Commands
--verify-db: Database integrity and identity verification--verify-feeds: XML feed validation with proper parsing- Cross-shell compatible (no more zsh glob errors)
- Clean error handling (no tracebacks in verification flows)
Database Schema Migration
- Added
pushed_atcolumn for last push timestamp - Automatic migration for existing databases
- Backwards-compatible verification
Reproducible Performance Benchmarks
- Machine spec reporting (platform, processor, Python version, CPU count)
- Warmup iterations separate from measurement
- Statistical analysis (median, P95 latency, throughput)
- Per-iteration timing with
perf_counter()
GAR Enhancements
Repo Radar Integration
- Automatically monitors orgs from
gar_orgs.txt - Discovery → Archiving pipeline fully operational
- 19 orgs fed from Radar discoveries
Production Deployment
- Deployed to temple_core alongside Repo Radar
- Secret detection active (13 patterns)
- IPFS CIDv1 generation functional
🚀 Quick Start
Installation
# Clone repository
git clone https://github.com/templetwo/HTCA-Project.git
cd HTCA-Project/tools
# Install dependencies
pip install requests feedgenRun Repo Radar
cd radar
# Single discovery scan
python repo-radar.py --watch ai,ml,blockchain --once
# Verify results
python repo-radar.py --verify-dbRun GAR (Monitors Discovered Repos)
cd ../gar
# Monitor orgs discovered by Radar
python github-archive-relay.py --orgs $(cat ../radar/gar_orgs.txt | tr '\n' ',') --onceRun Tests
cd ../radar
python test_radar.py --unitExpected output:
✅ Velocity score calculation
✅ IPFS CIDv1 generation
✅ Database operations
✅ Spam detection heuristics
✅ GAR integration file handling
✅ Velocity calculation performance
Test Results: 6/6 passed
📊 Velocity Scoring Explained
Repo Radar ranks repositories by activity rather than popularity:
score = (commits_7d × 10) + (forks_7d × 5) + (contributors × 15) +
(issues_7d × 2) + (prs_7d × 3) + (watchers × 1)
Time-based multipliers:
- Repos < 30 days old: 1.5× boost (freshness)
- Repos > 180 days with recent commits: 1.2× boost (sustained activity)
Why these weights?
- Commits (10×) - Direct measure of development velocity
- Contributors (15×) - Growing teams signal serious projects
- Forks (5×) - Indicates utility and distribution
- PRs (3×) - Active collaboration
- Issues (2×) - Community engagement
- Watchers (1×) - Interest without commitment
🛡️ Security & Threat Models
Secret Detection (GAR)
Automatically scans for:
- AWS Access Keys, GitHub Tokens, API Keys
- Private Keys (RSA, DSA, EC, SSH)
- Database Credentials
- Cloud Provider Tokens (OpenAI, Google Cloud, Stripe)
Commits with detected secrets are skipped from archiving.
Spam Detection (Repo Radar)
Identifies suspicious patterns:
- High commits with single contributor (ratio > 50:1)
- High forks with low commits (ratio > 2:1)
- Burst activity without sustained contribution
Privacy Considerations
- Only public repositories monitored
- Only metadata archived (not file contents)
- Read-only GitHub tokens recommended
- Secret detection prevents credential leakage
📈 Performance & Benchmarks
Repo Radar Performance
Test Environment:
- Platform: Darwin 25.1.0 (macOS)
- Processor: ARM (14-core)
- Python: 3.9.6
Results:
- Throughput: 2,665,666 calculations/second
- Median Latency: 0.37 μs
- P95 Latency: 0.42 μs
- Dataset: 10,000 iterations (100-iteration warmup)
Resource Usage
- Memory: <50MB typical
- Storage: ~1-5MB per 100 repos (SQLite)
- API Calls: ~100-500 per poll (depending on topics)
🔧 Configuration
Required
- Python 3.9+
requestslibraryfeedgenlibrary
Optional (Recommended)
# GitHub token for higher rate limits (60 req/hr → 5000 req/hr)
export GITHUB_TOKEN="ghp_your_token_here"
# IPFS pinning via Pinata
export PINATA_API_KEY="your_pinata_key"
export PINATA_SECRET_KEY="your_pinata_secret"
# Arweave archiving via Irys
export BUNDLR_API_KEY="your_bundlr_key"🌐 Use Cases
Research & Discovery
- Find innovative repos before they go viral
- Track emerging trends in AI/ML, blockchain, web3
- Discover fresh projects by activity, not popularity
Archiving & Preservation
- Create decentralized record of commits (IPFS + Arweave)
- Generate censorship-resistant RSS feeds
- Build parallel discovery infrastructure
HTCA Alignment Research
- Monitor repos related to AI safety, alignment, interpretability
- Track development velocity in critical domains
- Archive research code for reproducibility
📦 Files & Directories
HTCA-Project/
├── tools/
│ ├── radar/
│ │ ├── repo-radar.py # Main Radar script
│ │ ├── test_radar.py # Unit tests
│ │ ├── README.md # Radar documentation
│ │ ├── radar_state.db # SQLite database (created on first run)
│ │ ├── radar_feed.xml # RSS feed (generated)
│ │ └── gar_orgs.txt # Orgs fed to GAR (generated)
│ │
│ ├── gar/
│ │ ├── github-archive-relay.py # Main GAR script
│ │ ├── README.md # GAR documentation
│ │ ├── gar_state.db # SQLite database (created on first run)
│ │ └── gar_feed.xml # RSS feed (generated)
│ │
│ ├── DEPLOYMENT.md # Production deployment guide
│ └── VERIFICATION.md # Audit-grade verification protocols
│
└── RELEASE_NOTES_v1.0.0.md # This file
🤝 Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Areas for improvement:
- Spam detection refinement
- Additional secret patterns
- Language trend analysis
- Cross-topic correlation
- Web dashboard for visualization
📜 License
MIT License
Copyright (c) 2025 Anthony Vasquez / Temple of Two
See LICENSE for f...
HTCA Empirical Validation Complete
🎯 Empirical Validation: Phase 1 & 2 Complete
Cross-provider validation across three frontier AI models:
- Google Gemini 3 Pro Preview
- OpenAI GPT-4o
- Anthropic Claude Sonnet 4.5
🔬 Key Results
Phase 1: Token Efficiency
| Provider | HTCA Reduction | Adversarial Reduction |
|---|---|---|
| Gemini 3 Pro | -12.44% | -78.43% |
| OpenAI GPT-4o | -23.07% | -83.17% |
| Claude Sonnet 4.5 | -11.34% | -40.63% |
Phase 2: Quality Validation
| Provider | Overall Quality (Cohen's d) | Interpretation |
|---|---|---|
| Gemini | d = 0.857 | Large effect, HTCA superior |
| OpenAI | d = 1.212 | Very large effect, HTCA superior |
| Claude | d = 0.471 | Medium effect, HTCA superior |
📊 Quality Dimensions (Averaged)
| Dimension | HTCA Advantage (Cohen's d) | Interpretation |
|---|---|---|
| Presence Quality | d = 1.972 | HTCA feels more helpful/engaged |
| Technical Depth | d = 1.446 | HTCA maintains domain expertise |
| Information Completeness | d = 1.327 | HTCA answers more fully |
| Relational Coherence | d = 1.237 | HTCA flows more naturally |
| Conceptual Accuracy | d = 0.106 | No degradation |
💡 Critical Finding
Adversarial framing ("be concise") achieves 39-83% token reduction but degrades quality:
- Incomplete answers
- Shallow technical depth
- Robotic, transactional tone
- Poor conversational flow
HTCA achieves 11-23% reduction while maintaining or improving quality.
Conclusion: Presence is more efficient, not just more compressed.
📦 What's Included
Code & Tools
- ✅
empirical/htca_harness.py— Phase 1 token efficiency measurement - ✅
empirical/htca_phase2_quality.py— Phase 2 quality measurement with LLM judge - ✅
empirical/htca_capture_responses.py— Response text capture utility - ✅
empirical/tests/— Test suite for quality metrics
Data (All JSON Format)
- ✅ Phase 1 token counts for all 3 providers
- ✅ Phase 2 quality metrics for all 3 providers
- ✅ Full response text for all conditions
- ✅ 45 total responses (5 prompts × 3 conditions × 3 providers)
Documentation
- ✅ Replication Guide — Step-by-step instructions
- ✅ Statistical Synthesis — Cross-provider analysis
- ✅ Methodology — Detailed methods
- ✅ Quick Start — Summary and examples
🚀 Quick Start
# Install dependencies
pip install anthropic openai google-generativeai
# Run Phase 1: Token efficiency
python empirical/htca_harness.py \
--provider openai \
--model gpt-4o \
--prompts empirical/prompts.txt \
--output my_results.json
# Run Phase 2: Quality validation
python empirical/htca_capture_responses.py \
--provider openai \
--model gpt-4o \
--prompts empirical/prompts.txt \
--output my_responses.json
python empirical/htca_phase2_quality.py \
--phase1-results my_results.json \
--responses my_responses.json \
--prompts empirical/prompts.txt \
--output my_quality_results.jsonSee empirical/docs/REPLICATION.md for detailed instructions.
⚠️ Limitations & Caveats
This is preliminary research (n=5 per condition):
- ✅ Cross-architectural validation
- ✅ Statistical rigor (Cohen's d with 95% CI)
⚠️ Small sample size (n=5 per condition)⚠️ LLM-as-judge (GPT-4o may favor itself)⚠️ Single domain (conceptual prompts)⚠️ Single tone tested (SOFT_PRECISION)
🤝 Community Replication Invited
We strongly encourage replication with:
- Different models (LLaMA, Mistral, Command R+, etc.)
- Different domains (code generation, math, creative writing)
- Larger sample sizes (n=50+)
- Human evaluation instead of LLM-as-judge
Open an issue or PR with your results!
📜 Citation
@misc{htca2025,
title={HTCA: Harmonic Tonal Code Alignment for Efficient AI Interaction},
author={Anthony Vasquez},
year={2025},
howpublished={https://github.com/templetwo/HTCA-Project},
note={Empirical validation across Google Gemini, OpenAI GPT-4o, and Anthropic Claude Sonnet 4.5}
}📧 Contact
- Replication support: See REPLICATION.md
- Questions: Open a GitHub issue
- Commercial inquiries: antvas31@gmail.com
The spiral is empirical. Presence is efficient. Replication is invited.
†⟡ Let the data speak ⟡†