This file helps AI agents discover and understand how to work with this repository.
- Primary entry points:
README.md,include/,src/, andtests/describe the architecture and entry points for this library. Usergto locate interesting symbols before jumping into implementation. - Python bindings: The new
python/directory holds the pybind11-based module andtests/python/test_bindings.pyexercises it; toggleT81LIB_BUILD_PYTHON_BINDINGSwhen configuring CMake to build the module. - Build tooling: The project uses CMake. Inspect
CMakeLists.txtand related files incmake/ordocs/for build and test instructions before making changes.
- Follow the existing coding style in
include/t81/core/and use ASCII-only edits unless a file already includes other Unicode characters. - Prefer
rgfor searching and avoid destructive operations (git reset --hard, etc.). - Respect non-AI manual edits in the working tree; do not revert unless asked.
- Run any relevant unit tests in
tests/unit/via CTest or the provided scripts whenever you touch critical paths to verify behavior. - Document significant algorithm changes in
docs/orREADME.mdas appropriate. - Mention new files or important updates back in this file so future agents can find your work quickly.
- Reworked the top-level
CMakeLists.txt, rewroterun-tests.shto execute configure/build/tests, and reorderedtests/unit/test_limb_basic.cppsot81/t81lib.hppis included before the SIMD helpers to keeplimbdefined. - Balanced ternary bigint logic in
include/t81/core/bigint.hppnow normalizes signed limbs more efficiently and fixes~/division helpers so later agents can spot the modern bitwise/division flow. tests/unit/test_numeric_types.cppnow exercisesComplex,Polynomial, andF2mhelpers so the umbrella numeric helpers stay locked down.README.mdnow documents the high-level helpers (Float,Ratio,Complex,Polynomial,F2m,Fixed<N>,Modulus, andMontgomeryInt) plus thet81::Intalias exposed throught81/t81lib.hpp.include/t81/t81lib.hppnow exposesFloat::from_string, aRatio→Floatconversion, theInt81Fixed<48>alias, andstd::hashhooks forlimb/bigintso hashing and string-based floats land in the umbrella header.README.mdplus the umbrella header now document theFloatNtemplate, ternary_t3literal, R3 NTT helpers, andstd::formatterspecializations so overlined ternary floats behave nicely instd::format, andt81::Vectorprovides a ready-to-use coefficient container with arithmetic helpers.README.md/umbrella header now mentiont81::Matrix<Element, R, C>and how it complementsVectorfor linear algebra overFloatN/Fixedscalars.F2mnow lives ininclude/t81/gf2m.hpp(still re-exported throught81/t81lib.hpp), andFixed<N>gained balanced/and%helpers so division/magnitude math stays accessible in the umbrella header.- Added
t81::linalg::gemm_ternaryand the Python bindingt81lib.gemm_ternaryso packed ternary GEMMs with alpha/beta semantics are exposed across the C++/Python API surface. - Documented the
t81.torch/t81.nnPyTorch helpers inREADME.mdanddocs/index.md, pointing to theexamples/demo_llama_conversion.py,examples/scaling_laws_ternary.py, andexamples/ternary_sparse_preview.pydemos so future agents can locate the torch bridge. - Added production-ready Python bindings (
python/bindings.cpp) plus packaging helpers (setup.py,pyproject.toml) that exposeLimb/BigInthelpers, Montgomery contexts, NumPy quantization utilities, and a tutorial notebookexamples/ternary_quantization_demo.ipynb. - Added
t81.hardware.TernaryEmulator, documentation for hardware simulation, andexamples/ternary_hardware_sim_demo.ipynbso agents can explore ternary gate/circuit modeling, fuzzy AI decisions, and power-aware PyTorch inference workflows. - Added
docs/references/cli-usage.md(linked fromdocs/index.md) to covert81-convert,t81-gguf, andt81-qatusage with the CPU/offloading tips we surfaced for low-memory Apple Silicon. - Added a unified
t81console script that exposesconvert/ggufsubcommands while preserving the legacyt81-convert/t81-ggufwrappers, plus updated docs/tests to reference the new entry point. - Added
docs/diagrams/cli-workflows-mermaid.mdto visualize thet81-convert,t81-gguf, andt81-qatworkflows for future contributors looking at the CLI surface. - Extended
examples/ternary_qat_inference_comparison.pyso it now runs train + validation loops, logs compression ratios + per-step losses, and correlates the ternary threshold history with measured GEMM latencies. - Added
scripts/quantize_measure.py, which chainst81-convert→AutoModel.from_pretrained_t81→ latency/compression stats so you can automate quantize→measure in other pipelines. - Added
docs/references/hardware-emulation.mdto explain howt81.hardware.TernaryEmulator, the Python quantization helpers, and the CLI automation fit together for energy-aware AI reasoning. - Added
scripts/quantize_energy_benchmark.pyto orchestrate quantize→latency+energy benchmarks, logging compression, timing, and emulator energy stats into CSV/JSON outputs for reuse in reports. - Added
examples/quantization_config_report.pyso you can sweep synthetic datasets (dims, thresholds, sizes) and capture accuracy, latency, and storage comparisons for multi-module configs before quantizing real models. - Added
t81/cli_validator.pyplus a--validateflag fort81-convert/t81-ggufso the CLI rerunsgguf.read_gguf(and llama.cpp’sgguf_validate/gguf_to_ggufwhen available) to ensure exported GGUF bundles stay compatible before a run returns success. - Added
t81/cli_progress.pyplus progress logging tot81-convert,t81-gguf, andt81-qatso the CLIs print bar/percentage updates while converting, exporting, or fine-tuning checkpoints. - Documented the automation scripts (
scripts/quantize_measure.py,scripts/quantize_energy_benchmark.py) plus the CLI telemetry/progress experience so future agents can quickly measure quantization impact, latency, and hardware energy from the console. - Added
examples/cli-examples.mdwith ready-to-copy CLI snippets showing conversion, GGUF export, and QAT flows for the three helpers. - Updated
README.mdto highlight the CLI docs/diagrams/examples so newcomers can find the new references through the main overview. - Added
docs/ROADMAP.mdto capture an executive summary, analysis, and next-step recommendations for steering t81lib toward wider adoption and smoother contributions. - Added
mkdocs.yml,docs/python-api.md, anddocs/python-cookbook.mdso MkDocs + mkdocstrings can publish the Python API reference and cookbook, and linked them fromdocs/index.md. - Expanded
python/t81/__init__.pyso the higher-levelt81package re-exports the compiled binding helpers (t81lib,BigInt,Limb,gemm_ternary, etc.) while staying import-safe when the extension is unavailable. - Added
scripts/ternary_quantization_benchmark.pyplusBENCHMARKS.mdso contributors can reproduce a Fashion-MNIST FP32/PTQ/QAT benchmark and log accuracy/latency/storage for each mode; README now links the benchmark doc. - Rewrote
pyproject.tomlwith valid TOML sections so editable installs (andpip install -e '.[torch]') can parse the metadata cleanly before building the extension. - Restructured
README.mdinto a onboarding-focused front door and added companion docs (docs/use-cases.md,docs/hardware.md,docs/api-overview.md,docs/python-install.md,docs/torch.md,docs/gpu.md,examples/README.md) so heavy reference material lives outside the visitor-facing overview. - Added optional CUDA/ROCm toggles plus a GPU dispatcher sketch (
include/t81/linalg/gemm_gpu.hpp,src/linalg/{gemm_cuda.cu,gemm_dispatch.cpp,gemm_rocm.cpp}) so future teams can wire the newwhere/clamp/lerp/addcmulhelpers into GPU kernels, introducedt81::TensorMetadata+ Python helpers (python/bindings.cpp) that extract metadata from NumPy/Torch tensors, and expandedtests/python/test_gpu_ops.pyto cover the metadata-backed bindings on both CPU and GPU paths. - Enhanced
tests/python/test_gguf.pywith quant-parameterized round-trip checks, metadata assertions, and a regression case for invalid quant identifiers to spotlight the GGUF helpers before future agents touch them. - Hardened the SIMD detection helpers in
include/t81/core/detail/simd.hppwith CPUID/xgetbv fallbacks, documented theadd_trytes_*overflow semantics, and made NEON runtime checks opt-out viaT81_DISABLE_NEON. - Added the
compression-firstGGUF export profile (metadata + CLI flags), plusscripts/gguf_benchmark.pyand CLI docs that walk FP16 to ternary GGUF before/after measurements. - Added
examples/ternary_phi3_ptq_qat_demo.ipynbto showcase Phi-3-mini PTQ/QAT size, latency, and perplexity comparisons in one compact notebook. - Added Metal pack/quantize kernels (
src/linalg/pack_kernel.metal,src/linalg/pack_metal.mm) plusinclude/t81/linalg/pack_gpu.hppand Python binding dispatch so PTQ packing can run on Apple Metal when enabled. - Documented GGUF helper APIs (
read_gguf,repack_gguf,dequantize_gguf) plus the experimental TQ1_1 note in the GGUF and Python docs.