Skip to content

Conversation

ncylich
Copy link
Contributor

@ncylich ncylich commented Oct 13, 2025

Add Nomic Embedding Model Support

Summary

Adds foundational support for Nomic BERT embedding models (e.g., nomic-embed-text-v2-moe)
with weight loading and HuggingFace conversion. Model architecture is defined but forward
pass implementation is deferred to future PRs.

Key Changes

Model Architecture

  • New NomicModel class with weight loading for:
    • Embedding layer normalization
    • Transformer layers (Q/K/V attention, FFN, layer norms)
    • Mixture-of-Experts (MoE) layers
  • Placeholder methods for attention, MLP, and forward pass (to be implemented)

Configuration

  • Added MoE config parameters: num_experts, num_shared_experts, num_top_experts,
    moe_every_n_layers
  • New ModelType::NOMIC enum and factory support

HuggingFace Conversion

  • Enhanced convert_hf.py for Nomic BERT models:
    • Fuses word + token type embeddings
    • Splits combined QKV weight matrices
    • Exports MoE weights (router, per-expert MLPs)
    • Handles embedding layer norms
  • Fallback to AutoModel for non-causal models
  • Improved quantization handling for embeddings/norms/biases

Files Changed

  • cactus/models/model.h (+62 lines)
  • cactus/models/model_nomic.cpp (+82 lines, new)
  • cactus/engine/engine.h (+6 lines)
  • cactus/engine/engine_model.cpp (+7 lines)
  • tools/convert_hf.py (+110/-22 lines)

Next Steps

Add operation and kernel updates in origin/moe-ops-for-nomic-embed and forward pass implementation is complete on the implemented-nomic-model branch

@ncylich ncylich force-pushed the load-nomic-embed branch 14 times, most recently from e2dea5a to 5575d24 Compare October 15, 2025 20:48
… support for it

Signed-off-by: Noah Cylich <noahcylich@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants