feat: LFM2-24B +7% decode, foundry tracking, repo cleanup by Hmbown · Pull Request #10 · Hmbown/ZMLX

Hmbown · 2026-02-11T15:45:12Z

Summary

LFM2-24B-A2B-MLX-4bit: +7% decode speedup (new)

+7% decode speedup on LFM2-24B-A2B-MLX-4bit (stock MLX, M4 Max 36GB)
500/500 greedy token-identical fidelity confirmed over 5 runs
Works automatically with patch(model) — no configuration needed
pip install "zmlx[lm]" is all you need

import mlx_lm
from zmlx.patch import patch

model, tokenizer = mlx_lm.load("LiquidAI/LFM2-24B-A2B-MLX-4bit")
patch(model)  # auto-detects 64 experts, applies D-SIMD gate kernel
text = mlx_lm.generate(model, tokenizer, prompt="Hello", max_tokens=200)

Verify: python -m zmlx.validate LiquidAI/LFM2-24B-A2B-MLX-4bit --max-tokens 200 --runs 3

Technical details:

New D-SIMD Metal kernel for 64-expert MoE gating (2 SIMD groups, ascending value order output matching argpartition)
Smart K-based defaults: LFM2-8B (K=2) keeps fused SwiGLU (+12%), LFM2-24B (K=4) uses D-SIMD gate + native combine
Auto-disables gather_qmm_swiglu for K>=3 (causes 0.77x regression at K=4)

Repo cleanup (from earlier commits)

Track foundry module (src/zmlx/foundry/, 48 files): kernel template evaluation and SFT dataset export
Track key docs (CLAUDE.md, UPSTREAM_PLAN.md): makes AI agent context and upstream plan available
Clean .gitignore: add ephemeral output dirs, remove entries hiding tracked-worthy files
Create docs/FOUNDRY.md: documents foundry CLI, SFT export workflow, and module layout

Test plan

969/969 full test suite pass (0 failures, 75 skipped, 3 xfailed)
200/200 fidelity PASS at 200 tokens (3 runs, 1.072x)
500/500 fidelity PASS at 500 tokens (5 runs, 1.060x)
Default patch(model) works without env vars
Kernel microbench: 0/100 expert mismatches, 0/100 ordering mismatches
Community validation on other M-series chips (M1, M2, M3)

🤖 Generated with Claude Code

- Un-gitignore and track CLAUDE.md, AGENTS.md, UPSTREAM_PLAN.md (fixes broken README link to UPSTREAM_PLAN.md; makes AI agent context available to all cloners) - Un-gitignore and track src/zmlx/foundry/ (48-file kernel template evaluation module that was previously local-only) - Track configs/qwen3_1p7b_kernel_sft_lora.yaml (LoRA SFT config) - Add sessions/, runs/, training_data/, discover_sessions/ to .gitignore (ephemeral output directories) - Rename docs/AGENTS.md -> docs/DEVELOPMENT.md (content is a dev guide/backlog, not agent instructions) - Create docs/FOUNDRY.md documenting the foundry module CLI - Update CLAUDE.md: add foundry/discover module sections, CLI entry points table - Update AGENTS.md: add foundry/discover to file layout tree - Update README.md docs table: add FOUNDRY.md link - Fix ruff lint issues in foundry module (UP006, UP035, I001, B905) - Move 7 stale root prompt files to sessions/prompts/ (local only) Validation: ruff check . clean, pytest 920 passed / 75 skipped / 3 xfailed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- ARCHITECTURE.md: fix `docs/UPSTREAMING.md` -> `UPSTREAM_PLAN.md` - ROADMAP.md: fix `benchmarks/results/TEST_SUMMARY.md` -> `BENCHMARKS.md` Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-02-11T15:46:48Z

Summary of Changes

Hello @Hmbown, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the project's structure by integrating a new foundry module for Metal kernel development and evaluation, enhancing documentation accessibility, and refining the .gitignore to accurately reflect tracked and untracked files. The changes aim to streamline kernel optimization workflows and improve clarity for developers and AI agents interacting with the codebase.

Highlights

Foundry Module Integration: The src/zmlx/foundry/ module, previously gitignored, has been integrated to manage kernel template evaluation and SFT dataset export, including ruff lint fixes.
Documentation Visibility: Key documentation files (CLAUDE.md, AGENTS.md, UPSTREAM_PLAN.md) are now tracked, making AI agent context and upstream plans accessible to all cloners. A broken README link to UPSTREAM_PLAN.md was also fixed.
.gitignore Cleanup: The .gitignore file was updated to include ephemeral output directories (sessions/, runs/, training_data/, discover_sessions/) and remove incorrect entries that were hiding trackable files.
Documentation Restructuring: docs/AGENTS.md was renamed to docs/DEVELOPMENT.md to better reflect its content as a development guide, and a new docs/FOUNDRY.md was created to document the foundry CLI and workflow.
CLAUDE.md Update: CLAUDE.md was updated to include sections for the foundry/discover modules and a table of CLI entry points.
Version Bump: The project version was updated to 0.8.5 across pyproject.toml and src/zmlx/__init__.py.
Benchmark Updates: New benchmark-vs-baseline truth-set artifacts and reproducibility capsules were added, and the README was updated to reflect the latest benchmark results and recommendations for GLM and Qwen models.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

CHANGELOG.md
- Added benchmark-vs-baseline truth-set artifacts for current kernel candidates.
- Added follow-up reproducibility capsules for replicated isolation sweeps across GLM and Qwen suites.
- Updated README benchmark sections to reflect the 2026-02-11 benchmark-vs-baseline truth set.
- Clarified Qwen candidate variants remain non-promoted; GLM glm_combine_fp32_no_fma remains active default.
pyproject.toml
- Updated project version to 0.8.5.
src/zmlx/init.py
- Updated __version__ string to 0.8.5.

Activity

No human activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is a substantial pull request that introduces the new foundry module for kernel template evaluation and dataset generation, along with significant documentation updates and .gitignore cleanup. The new module is well-structured and the code quality is high. I have a few suggestions to improve consistency and maintainability within the new module.

gemini-code-assist · 2026-02-11T15:50:45Z

src/zmlx/foundry/harness/evaluate.py

+
+
+def _iso_now() -> str:
+    return _dt.datetime.utcnow().replace(microsecond=0).isoformat() + "Z"


datetime.utcnow() is deprecated since Python 3.12 and should be avoided. Please use the timezone-aware datetime.now(datetime.timezone.utc) instead. This also helps with the timestamp format consistency across the module.

Suggested change

return _dt.datetime.utcnow().replace(microsecond=0).isoformat() + "Z"

return _dt.datetime.now(_dt.timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z")

gemini-code-assist · 2026-02-11T15:50:46Z

src/zmlx/foundry/ops/base.py

+    # Bridge methods: satisfy the taxonomy.KernelOp Protocol interface
+    # so that harness/evaluate.py can use ops directly.
+    # ------------------------------------------------------------------


The use of 'bridge methods' to adapt this KernelOp ABC to the KernelOp Protocol defined in taxonomy.py can be confusing for new contributors. It suggests a slight design divergence between the op implementations and the harness that consumes them. For better long-term maintainability, consider refactoring to a single, unified KernelOp interface. This would make the contract for kernel operations clearer and the system easier to extend.

gemini-code-assist · 2026-02-11T15:50:46Z

src/zmlx/foundry/session.py

+
+def _utc_now_iso() -> str:
+    """ISO-8601 UTC timestamp string."""
+    return _dt.datetime.now(_dt.timezone.utc).isoformat()


There's an inconsistency in timestamp formatting. This function produces timestamps with a +00:00 timezone offset, while _iso_now in harness/evaluate.py uses the Z suffix. It would be best to standardize on one format for all timestamps. I'd recommend using the Z suffix, as it's more common for UTC timestamps in many systems. Also, removing microseconds can lead to cleaner timestamps if they are not required.

Suggested change

return _dt.datetime.now(_dt.timezone.utc).isoformat()

return _dt.datetime.now(_dt.timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z")

Copilot

Pull request overview

This PR brings the previously gitignored src/zmlx/foundry/ module into the repo and updates supporting docs/config to enable kernel-template evaluation, dataset generation, and export workflows, alongside housekeeping changes (.gitignore cleanup, docs navigation, and version bump).

Changes:

Adds the Foundry module (templates, harness, sampling, reports, plugins, ops registry, export utilities) for Metal kernel variant generation/evaluation.
Updates documentation and repo metadata (new docs pages, README + UPSTREAM_PLAN link updates, AGENTS guidance, CHANGELOG entry).
Cleans .gitignore, adds benchmark repro capsules, and bumps version to 0.8.5.

Reviewed changes

Copilot reviewed 89 out of 91 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
src/zmlx/foundry/templates/swiglu/t1_unrolled.metal	Adds a SwiGLU Metal template variant with vector/unroll knobs and fault-injection placeholders.
src/zmlx/foundry/templates/swiglu/t0_basic.metal	Adds a basic SwiGLU Metal template with fast-math toggle and fault injection.
src/zmlx/foundry/templates/rmsnorm/t1_tgmem.metal	Adds a tgmem-staging RMSNorm template variant (contains a correctness bug noted in comments).
src/zmlx/foundry/templates/rmsnorm/t0_basic.metal	Adds a basic RMSNorm template with TG/UNROLL/VEC knobs (VEC currently unused; noted).
src/zmlx/foundry/templates/moe_combine/t2_row_tile.metal	Adds a MoE combine template using threadgroup tiling for packed assignments.
src/zmlx/foundry/templates/moe_combine/t1_k8_unrolled.metal	Adds a MoE combine template with fixed k-unroll inner loop.
src/zmlx/foundry/templates/moe_combine/t0_basic.metal	Adds a baseline MoE combine template.
src/zmlx/foundry/templates/render.py	Adds a small mustache-style renderer splitting header/body for `mx.fast.metal_kernel`.
src/zmlx/foundry/templates/init.py	Adds template discovery/loading helpers.
src/zmlx/foundry/taxonomy.py	Introduces Foundry core dataclasses/protocols (candidates, results, backend/op protocols).
src/zmlx/foundry/session.py	Adds session directory/log management and a simple compile marker cache.
src/zmlx/foundry/scheduler.py	Adds curriculum scheduler for staged op unlocking (curriculum naming mismatch noted).
src/zmlx/foundry/sampling/sampler.py	Adds candidate sampling (random/coverage/mutation/mix) + elite loading.
src/zmlx/foundry/sampling/mutate.py	Adds knob mutation logic with optional error injection.
src/zmlx/foundry/sampling/coverage.py	Adds deterministic shape/layout coverage generators.
src/zmlx/foundry/sampling/init.py	Exposes sampling API surface.
src/zmlx/foundry/reports/pareto.py	Adds Pareto/best-by-p50 utilities across multiple record layouts.
src/zmlx/foundry/reports/coverage.py	Adds coverage report generation (shape/layout handling issues noted).
src/zmlx/foundry/reports/init.py	Exposes reports API surface.
src/zmlx/foundry/plugins/registry.py	Adds plugin discovery/loaders via entry points or local module paths.
src/zmlx/foundry/plugins/protocols.py	Defines plugin Protocol contracts and context dataclasses.
src/zmlx/foundry/plugins/init.py	Exposes plugin API surface.
src/zmlx/foundry/ops/topk.py	Adds `topk` reference op for Foundry registry.
src/zmlx/foundry/ops/swiglu.py	Adds `swiglu` op spec/knobs/reference for Foundry.
src/zmlx/foundry/ops/softmax.py	Adds `softmax` reference op.
src/zmlx/foundry/ops/scatter.py	Adds `scatter` reference op.
src/zmlx/foundry/ops/rope.py	Adds `rope` reference op.
src/zmlx/foundry/ops/rmsnorm.py	Adds `rmsnorm` op spec/knobs/reference for Foundry.
src/zmlx/foundry/ops/quantize.py	Adds `quantize` reference op (dtype/spec mismatch noted).
src/zmlx/foundry/ops/moe_topk.py	Adds `moe_topk` reference op.
src/zmlx/foundry/ops/moe_pack.py	Adds `moe_pack` reference op for packing assignments.
src/zmlx/foundry/ops/moe_dispatch.py	Adds `moe_dispatch` reference op for dispatch gather.
src/zmlx/foundry/ops/moe_combine.py	Adds `moe_combine` op + Metal templates + bytes/flops estimation.
src/zmlx/foundry/ops/layernorm.py	Adds `layernorm` reference op.
src/zmlx/foundry/ops/kv_append.py	Adds `kv_append` reference op.
src/zmlx/foundry/ops/grouped_gemm.py	Adds `grouped_gemm` reference hook (NumPy per-expert matmul).
src/zmlx/foundry/ops/gather.py	Adds `gather` reference op.
src/zmlx/foundry/ops/dequantize.py	Adds `dequantize` reference op.
src/zmlx/foundry/ops/init.py	Registers all ops into a name→instance registry.
src/zmlx/foundry/ndjson.py	Adds append-only NDJSON utilities (determinism claim mismatch noted).
src/zmlx/foundry/ids.py	Adds stable attempt/cache key hashing and shape class helper.
src/zmlx/foundry/harness/correctness.py	Adds correctness metrics + tolerance gating (MLX ULP omission noted).
src/zmlx/foundry/harness/compile.py	Adds compile wrapper with caching + error capture.
src/zmlx/foundry/harness/cache.py	Adds in-process compile cache container.
src/zmlx/foundry/harness/bench.py	Adds adaptive benchmark loop with percentiles/timeouts.
src/zmlx/foundry/harness/backend.py	Adds MLX + Mock backends for compilation/execution.
src/zmlx/foundry/harness/init.py	Exposes harness API surface.
src/zmlx/foundry/export/training.py	Adds training JSONL export for successful attempts.
src/zmlx/foundry/export/init.py	Exposes export API surface.
src/zmlx/foundry/init.py	Adds module-level doc + `__all__` surface and CLI pointers.
src/zmlx/init.py	Bumps library version to `0.8.5`.
pyproject.toml	Bumps package version to `0.8.5`.
docs/FOUNDRY.md	Adds Foundry usage/workflow docs (op list inaccuracies noted).
docs/DEVELOPMENT.md	Adds/renames development guide content.
configs/qwen3_1p7b_kernel_sft_lora.yaml	Adds LoRA config for kernel SFT training dataset.
benchmarks/repro_capsules/qwen3_isolation_ordered_t200_r5_repB_20260211_summary.json	Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/qwen3_isolation_ordered_t200_r5_repA_20260211_summary.json	Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/qwen3_isolation_ordered_t1024_r5_repB_20260211_summary.json	Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/qwen3_isolation_ordered_t1024_r5_repA_20260211_summary.json	Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/qwen3_benchmark_vs_baseline_t200_r3_repB_20260210_summary.json	Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/qwen3_benchmark_vs_baseline_t200_r3_repA_20260210_summary.json	Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/qwen3_benchmark_vs_baseline_t1024_r3_repB_20260210_summary.json	Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/qwen3_benchmark_vs_baseline_t1024_r3_repA_20260210_summary.json	Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/glm47_final_longconfirm_t1024_r5_20260211_summary.json	Adds GLM long-context confirmation capsule summary.
benchmarks/repro_capsules/glm47_final_longconfirm_t1024_r5_20260211_glm_combine_fp32_no_fma.json	Adds detailed GLM capsule artifact.
benchmarks/repro_capsules/glm47_final_longconfirm_t1024_r5_20260211_control_swiglu_moe.json	Adds detailed GLM control capsule artifact.
benchmarks/repro_capsules/glm47_consistency_abba_t200_r5_repB_20260211_summary.json	Adds GLM consistency capsule summary.
benchmarks/repro_capsules/glm47_consistency_abba_t200_r5_repA_20260211_summary.json	Adds GLM consistency capsule summary.
benchmarks/repro_capsules/glm47_consistency_abba_t1024_r5_repB_20260211_summary.json	Adds GLM consistency capsule summary.
benchmarks/repro_capsules/glm47_consistency_abba_t1024_r5_repA_20260211_summary.json	Adds GLM consistency capsule summary.
UPSTREAM_PLAN.md	Adds tracked upstream plan doc.
README.md	Updates benchmark guidance + adds Foundry docs link.
CHANGELOG.md	Adds 0.8.5 entry documenting benchmark artifacts and README update.
AGENTS.md	Adds AI agent guidance doc for repo conventions and safety constraints.
.gitignore	Cleans ignore patterns; adds ephemeral dirs and stops ignoring Foundry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T15:54:11Z

src/zmlx/foundry/templates/rmsnorm/t1_tgmem.metal

+        w_tile[tid] = (float)w[col];
+    }
+    threadgroup_barrier(mem_flags::mem_threadgroup);
+
+    #pragma unroll
+    for (uint u = 0; u < UNROLL; ++u) {
+        uint colu = colBase + tid + u * TG_SIZE;
+        if (colu < H) {
+            uint elem = base + colu;
+            uint loc = elem_to_loc(elem, x_shape, x_strides, x_ndim);
+            float xv = (float)x[loc];
+            float wv = w_tile[tid]; // simplistic; same tid
+            float outv = xv * inv * wv;


This template caches only w[colBase + tid] into w_tile[tid] but then reuses it for colu = colBase + tid + u*TG_SIZE in the UNROLL loop. For UNROLL > 1 this applies the wrong weight to most columns and will produce incorrect RMSNorm outputs. Either remove the UNROLL loop for this template or stage/load the correct weights for each colu (e.g., load additional tiles or index weights directly by colu).

Copilot · 2026-02-11T15:54:11Z

src/zmlx/foundry/reports/coverage.py

+def _infer_shape_class(op: str, shape: dict[str, int]) -> str:
+    """Lightweight shape class string when the op registry is unavailable."""
+    b = shape.get("batch", 1)
+    s = shape.get("seq", shape.get("tokens", 1))
+    h = shape.get("hidden", 0)
+    parts = [f"b{b}", f"s{s}", f"h{h}"]
+    if not bool(shape.get("contiguous", True)):
+        parts.append("strided")
+    else:
+        parts.append("contig")
+    return "_".join(parts)


_infer_shape_class checks shape.get('contiguous', True), but contiguity lives in the layout dict (and the function is called with shape only). This will mislabel strided cases as contig in reports. Consider passing layout into _infer_shape_class and using layout.get('contiguous', True) (or read att['layout'] when present).

Copilot · 2026-02-11T15:54:12Z

src/zmlx/foundry/ndjson.py

+def dumps(record: dict[str, Any]) -> str:
+    """Compact, deterministic-order JSON serialisation."""
+    return json.dumps(
+        record,
+        sort_keys=False,
+        ensure_ascii=False,
+        separators=(",", ":"),
+        allow_nan=False,
+    )


The dumps() docstring claims "deterministic-order JSON" but json.dumps(..., sort_keys=False) does not guarantee key order unless every caller builds dicts in a consistent insertion order. Either set sort_keys=True or adjust the docstring to avoid promising determinism.

Copilot · 2026-02-11T15:54:12Z

src/zmlx/foundry/harness/correctness.py

+def compute_metrics_mlx(
+    mx: Any, y: Any, y_ref: Any, dtype: str
+) -> tuple[float, float, float | None]:
+    """Compute error metrics on-device via MLX and return python scalars."""
+    diff = mx.abs(y.astype(mx.float32) - y_ref.astype(mx.float32))
+    max_abs = float(mx.max(diff).item())
+    denom = mx.maximum(mx.abs(y_ref.astype(mx.float32)), 1e-8)
+    max_rel = float(mx.max(diff / denom).item())
+    ulp: float | None = None
+    return max_abs, max_rel, ulp


compute_metrics_mlx() currently never computes ULP for float32 (always returns ulp=None), unlike the numpy path. If ULP is intended to be reported for float32 correctness, consider adding an MLX-based ULP computation (bitcast to int32) or explicitly document that ULP is only available in numpy mode.

Copilot · 2026-02-11T15:54:12Z

src/zmlx/foundry/ops/quantize.py

+    def spec(self) -> OpSpec:
+        return OpSpec(
+            name=self.name,
+            kernel_class=KernelClass.QUANT,
+            summary="Quantize fp16/bf16 -> int8 or packed int4 (symmetric)",
+            inputs=["x[tokens,hidden]"],
+            outputs=["q[tokens,hidden] (int8) OR q_packed[tokens,hidden/2] (int4 packed)"],
+            op_params_schema={
+                "q_dtype": {"type": "str", "enum": ["int8", "int4"], "default": "int8"},
+                "scale": {"type": "float", "default": 0.02},
+            },
+            shape_hints={"tokens": TOKENS_LADDER, "hidden": HIDDEN_LADDER},
+            dtype_hints=["float16", "bfloat16"],
+            templates=["ref"],
+        )
+
+    def supported_dtypes(self) -> list[str]:
+        return ["float16", "bfloat16", "float32"]
+


QuantizeOp.spec() advertises input dtypes as fp16/bf16 only, but supported_dtypes() includes float32. This mismatch can lead the sampler/harness to generate float32 quantize attempts that contradict the op spec and dataset expectations. Either include float32 in the spec/dtype_hints or remove it from supported_dtypes().

Copilot · 2026-02-11T15:54:14Z

src/zmlx/foundry/reports/coverage.py

+        # Determine success -- support both Foundry and DataFoundry record layouts
+        res = att.get("result", {})
+        status = res.get("status")
+        correctness_ok = att.get("correctness", {}).get("ok", False)
+        bench_ok = att.get("bench", {}).get("ok", False)
+        is_ok = (status == "ok") or (correctness_ok and bench_ok)
+


Success detection for DataFoundry-style records should also require build.ok (as done in reports/pareto.py and export/training.py). Currently is_ok treats (correctness.ok and bench.ok) as success even if build info is missing/false, which can skew coverage stats. Align this condition with _is_successful logic (include build.ok).

Copilot · 2026-02-11T15:54:14Z

src/zmlx/foundry/session.py

+    def try_claim(self, attempt_id: str) -> bool:
+        """Atomically claim an attempt ID.  Returns True if newly claimed."""
+        if attempt_id in self._existing_ids:
+            return False
+        self._existing_ids.add(attempt_id)
+        return True


try_claim() is documented as "Atomically claim" but it only updates an in-memory set; across multi-process workers it's not atomic and duplicates can still be written (each worker has its own Session). Either clarify the docstring (process-local claim) or implement a real cross-process claim mechanism (e.g., file lock / atomic marker file).

Copilot · 2026-02-11T15:54:14Z

src/zmlx/foundry/export/sft.py

+
+    n_test = n - n_train - n_valid
+    if n_test < 0:
+        n_test = 0


Variable n_test is not used.

Copilot · 2026-02-11T15:54:14Z

src/zmlx/foundry/harness/backend.py

+            from mlx.core import metal
+            di = metal.device_info()
+            info.update(di)
+        except Exception:


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2026-02-11T15:54:15Z

src/zmlx/foundry/plugins/registry.py

+            except Exception:
+                # Log at debug level in production; here we silently skip.
+                continue
+    except Exception:


'except' clause does nothing but pass and there is no explanatory comment.

Add a new D-SIMD (dual SIMD group) Metal kernel for MoE gating with 64 experts (D=64, K=4). The kernel fuses softmax + bias + top-K selection into a single GPU dispatch using 2 SIMD groups (64 threads), replacing ~10 separate Metal dispatches per layer. Key insight: MLX's argpartition returns top-K in ascending value order. Matching this ordering in the kernel output is critical for token-identical fidelity — different accumulation order in the combine step causes cascading divergence across 40 layers. Results on M4 Max 36GB (stock MLX, 4-bit quantized): - 200 tokens: 152.6 → 163.1 tok/s (+7.2%), 200/200 fidelity PASS - 500 tokens: 152.0 → 161.1 tok/s (+6.0%), 500/500 fidelity PASS Smart defaults differentiate by K (num_experts_per_tok): - K<=2 (LFM2-8B): fused SwiGLU + kernel combine (existing +12% win) - K>=3 (LFM2-24B): D-SIMD gate + native combine, fused SwiGLU disabled (gather_qmm_swiglu causes 0.77x regression at K=4) patch(model) works automatically — no env vars needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merged 6 conflicted files with strategic resolution strategy: - CHANGELOG.md: kept theirs (GLM combine-mode changelog entry) - README.md: merged both sections (kept both LFM2-24B and GLM/Qwen3 benchmarks) - LAB_NOTEBOOK.md: kept HEAD (experimental benchmark sections) - pyproject.toml: merged description to mention both LFM2-8B and LFM2-24B - moe_mlp.py: merged both variants and experimental GLM combine modes - test_moe_fused_swiglu_gate.py: merged both test sets for GLM combine modes All changes are complementary: - Our LFM2-24B work + their GLM combine-mode fixes - Our experimental variants + their default behavior fixes Fidelity: all files clean (git diff --check passes)

- Add src/zmlx/fusion/ to git (was untracked, caused ModuleNotFoundError in CI macOS Metal tests) - Exclude foundry/fusion from mypy strict checks (pre-existing type issues in newly tracked code) - Fix union-attr mypy error in moe_mlp.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Hmbown and others added 3 commits February 10, 2026 22:50

docs: publish benchmark-vs-baseline truth set and bump 0.8.5

0faa380

docs: simplify benchmark tables to actionable winners

5e4c181

Copilot AI review requested due to automatic review settings February 11, 2026 15:45

Copilot started reviewing on behalf of Hmbown February 11, 2026 15:45 View session

docs: fix broken links in ARCHITECTURE.md and ROADMAP.md

3824ede

- ARCHITECTURE.md: fix `docs/UPSTREAMING.md` -> `UPSTREAM_PLAN.md` - ROADMAP.md: fix `benchmarks/results/TEST_SUMMARY.md` -> `BENCHMARKS.md` Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist bot reviewed Feb 11, 2026

View reviewed changes

Copilot AI reviewed Feb 11, 2026

View reviewed changes

Hmbown changed the title ~~repo: track foundry module, clean .gitignore, improve navigation~~ feat: LFM2-24B +7% decode, foundry tracking, repo cleanup Feb 24, 2026

Hmbown and others added 2 commits February 24, 2026 10:39

Hmbown merged commit 00dfdc3 into main Feb 24, 2026
12 checks passed

Hmbown deleted the repo-polish/navigation-cleanup branch February 24, 2026 16:47



		def _iso_now() -> str:
		return _dt.datetime.utcnow().replace(microsecond=0).isoformat() + "Z"

	return _dt.datetime.utcnow().replace(microsecond=0).isoformat() + "Z"
	return _dt.datetime.now(_dt.timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z")

Conversation

Hmbown commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

LFM2-24B-A2B-MLX-4bit: +7% decode speedup (new)

Repo cleanup (from earlier commits)

Test plan

Uh oh!

gemini-code-assist bot commented Feb 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hmbown commented Feb 11, 2026 •

edited

Loading