Omar Kamali omarkamali

Building AI for every language and culture 🌎🌍🌏

It's a-me, Omario! I build infrastructure for the next 3 billion AI users and developers. Not the ones in Silicon Valley, but the ones whose languages are still called "low-resource" like it’s their problem and a sealed fate. We will not be assimilated!

Founder of Omneity Labs, an independent GenAI R&D lab leveraging limited compute to drive innovation and build sovereign AI stacks for cultures the big players ignore.

The stack

Low-Resource Language Data

wikilangs.org - Pretrained NLP models for 340+ Wikipedia languages, no GPU needed
wikipedia-monthly - Fresh Wikipedia dumps in 340+ languages, updated monthly
wikisets - Flexible Wikipedia dataset builder for sampling and preprocessing

NLP Tooling

vocabulous - Language detection that works on messy, mislabeled data
unscript - Script-aware text cleaning for 340+ languages
babelvec - CPU-friendly sentence embeddings with multilingual alignment

LLM Training Experiments

CRAFT - Contrastive learning framework for multilingual LLM alignment
residuals - Task vectors for continuous LLM pretraining without retraining from scratch
curriculus - Curriculum learning for training efficiency (3.5% gains)

Dev Tooling

borgllm - Zero-config LLM router for 20+ providers, handles key rotation and rate limits
hypersets - Query massive HF datasets with DuckDB instead of loading into memory
zippy-data - Human-readable document store (JSONLs in a zip), 4M+ writes/sec in Rust
prepress - Polyglot release management for Python, Rust, Node.js projects

Operations

picomon - GPU monitoring for AMD, NVIDIA, and Apple Silicon

Let's talk

Building in MENA? Let's compare notes on cultural alignment
Have GPUs? Omneity Labs is always hungry for compute partners
Interested in multilingual AI? Come talk about bootstrapping NLP for 340+ languages
Want AI trained for your domain? I build custom LLMs and agentic systems that drive real bespoke software

Blog | Twitter | Hugging Face | omar@omneitylabs.com

P.S. Most tools exist because I hit a wall building Sawalni (first LLM for Moroccan Darija in arabic and latin scripts) or optimizing GPU usage while running experiments. Declarative beats imperative, but convention over configuration as the best tools are the ones you can pip install and simply forget.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Omar Kamali omarkamali

Block or report omarkamali

Building AI for every language and culture 🌎🌍🌏

The stack

Let's talk

Popular repositories Loading

Uh oh!