Skip to content

feat: LFM2-24B +7% decode, foundry tracking, repo cleanup#10

Merged
Hmbown merged 7 commits intomainfrom
repo-polish/navigation-cleanup
Feb 24, 2026
Merged

feat: LFM2-24B +7% decode, foundry tracking, repo cleanup#10
Hmbown merged 7 commits intomainfrom
repo-polish/navigation-cleanup

Conversation

@Hmbown
Copy link
Owner

@Hmbown Hmbown commented Feb 11, 2026

Summary

LFM2-24B-A2B-MLX-4bit: +7% decode speedup (new)

  • +7% decode speedup on LFM2-24B-A2B-MLX-4bit (stock MLX, M4 Max 36GB)
  • 500/500 greedy token-identical fidelity confirmed over 5 runs
  • Works automatically with patch(model) — no configuration needed
  • pip install "zmlx[lm]" is all you need
import mlx_lm
from zmlx.patch import patch

model, tokenizer = mlx_lm.load("LiquidAI/LFM2-24B-A2B-MLX-4bit")
patch(model)  # auto-detects 64 experts, applies D-SIMD gate kernel
text = mlx_lm.generate(model, tokenizer, prompt="Hello", max_tokens=200)

Verify: python -m zmlx.validate LiquidAI/LFM2-24B-A2B-MLX-4bit --max-tokens 200 --runs 3

Technical details:

  • New D-SIMD Metal kernel for 64-expert MoE gating (2 SIMD groups, ascending value order output matching argpartition)
  • Smart K-based defaults: LFM2-8B (K=2) keeps fused SwiGLU (+12%), LFM2-24B (K=4) uses D-SIMD gate + native combine
  • Auto-disables gather_qmm_swiglu for K>=3 (causes 0.77x regression at K=4)

Repo cleanup (from earlier commits)

  • Track foundry module (src/zmlx/foundry/, 48 files): kernel template evaluation and SFT dataset export
  • Track key docs (CLAUDE.md, UPSTREAM_PLAN.md): makes AI agent context and upstream plan available
  • Clean .gitignore: add ephemeral output dirs, remove entries hiding tracked-worthy files
  • Create docs/FOUNDRY.md: documents foundry CLI, SFT export workflow, and module layout

Test plan

  • 969/969 full test suite pass (0 failures, 75 skipped, 3 xfailed)
  • 200/200 fidelity PASS at 200 tokens (3 runs, 1.072x)
  • 500/500 fidelity PASS at 500 tokens (5 runs, 1.060x)
  • Default patch(model) works without env vars
  • Kernel microbench: 0/100 expert mismatches, 0/100 ordering mismatches
  • Community validation on other M-series chips (M1, M2, M3)

🤖 Generated with Claude Code

Hmbown and others added 3 commits February 10, 2026 22:50
- Un-gitignore and track CLAUDE.md, AGENTS.md, UPSTREAM_PLAN.md
  (fixes broken README link to UPSTREAM_PLAN.md; makes AI agent
  context available to all cloners)
- Un-gitignore and track src/zmlx/foundry/ (48-file kernel template
  evaluation module that was previously local-only)
- Track configs/qwen3_1p7b_kernel_sft_lora.yaml (LoRA SFT config)
- Add sessions/, runs/, training_data/, discover_sessions/ to
  .gitignore (ephemeral output directories)
- Rename docs/AGENTS.md -> docs/DEVELOPMENT.md (content is a dev
  guide/backlog, not agent instructions)
- Create docs/FOUNDRY.md documenting the foundry module CLI
- Update CLAUDE.md: add foundry/discover module sections, CLI
  entry points table
- Update AGENTS.md: add foundry/discover to file layout tree
- Update README.md docs table: add FOUNDRY.md link
- Fix ruff lint issues in foundry module (UP006, UP035, I001, B905)
- Move 7 stale root prompt files to sessions/prompts/ (local only)

Validation: ruff check . clean, pytest 920 passed / 75 skipped / 3 xfailed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 11, 2026 15:45
- ARCHITECTURE.md: fix `docs/UPSTREAMING.md` -> `UPSTREAM_PLAN.md`
- ROADMAP.md: fix `benchmarks/results/TEST_SUMMARY.md` -> `BENCHMARKS.md`

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gemini-code-assist
Copy link

Summary of Changes

Hello @Hmbown, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the project's structure by integrating a new foundry module for Metal kernel development and evaluation, enhancing documentation accessibility, and refining the .gitignore to accurately reflect tracked and untracked files. The changes aim to streamline kernel optimization workflows and improve clarity for developers and AI agents interacting with the codebase.

Highlights

  • Foundry Module Integration: The src/zmlx/foundry/ module, previously gitignored, has been integrated to manage kernel template evaluation and SFT dataset export, including ruff lint fixes.
  • Documentation Visibility: Key documentation files (CLAUDE.md, AGENTS.md, UPSTREAM_PLAN.md) are now tracked, making AI agent context and upstream plans accessible to all cloners. A broken README link to UPSTREAM_PLAN.md was also fixed.
  • .gitignore Cleanup: The .gitignore file was updated to include ephemeral output directories (sessions/, runs/, training_data/, discover_sessions/) and remove incorrect entries that were hiding trackable files.
  • Documentation Restructuring: docs/AGENTS.md was renamed to docs/DEVELOPMENT.md to better reflect its content as a development guide, and a new docs/FOUNDRY.md was created to document the foundry CLI and workflow.
  • CLAUDE.md Update: CLAUDE.md was updated to include sections for the foundry/discover modules and a table of CLI entry points.
  • Version Bump: The project version was updated to 0.8.5 across pyproject.toml and src/zmlx/__init__.py.
  • Benchmark Updates: New benchmark-vs-baseline truth-set artifacts and reproducibility capsules were added, and the README was updated to reflect the latest benchmark results and recommendations for GLM and Qwen models.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • CHANGELOG.md
    • Added benchmark-vs-baseline truth-set artifacts for current kernel candidates.
    • Added follow-up reproducibility capsules for replicated isolation sweeps across GLM and Qwen suites.
    • Updated README benchmark sections to reflect the 2026-02-11 benchmark-vs-baseline truth set.
    • Clarified Qwen candidate variants remain non-promoted; GLM glm_combine_fp32_no_fma remains active default.
  • pyproject.toml
    • Updated project version to 0.8.5.
  • src/zmlx/init.py
    • Updated __version__ string to 0.8.5.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a substantial pull request that introduces the new foundry module for kernel template evaluation and dataset generation, along with significant documentation updates and .gitignore cleanup. The new module is well-structured and the code quality is high. I have a few suggestions to improve consistency and maintainability within the new module.



def _iso_now() -> str:
return _dt.datetime.utcnow().replace(microsecond=0).isoformat() + "Z"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

datetime.utcnow() is deprecated since Python 3.12 and should be avoided. Please use the timezone-aware datetime.now(datetime.timezone.utc) instead. This also helps with the timestamp format consistency across the module.

Suggested change
return _dt.datetime.utcnow().replace(microsecond=0).isoformat() + "Z"
return _dt.datetime.now(_dt.timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z")

Comment on lines +160 to +162
# Bridge methods: satisfy the taxonomy.KernelOp Protocol interface
# so that harness/evaluate.py can use ops directly.
# ------------------------------------------------------------------

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of 'bridge methods' to adapt this KernelOp ABC to the KernelOp Protocol defined in taxonomy.py can be confusing for new contributors. It suggests a slight design divergence between the op implementations and the harness that consumes them. For better long-term maintainability, consider refactoring to a single, unified KernelOp interface. This would make the contract for kernel operations clearer and the system easier to extend.


def _utc_now_iso() -> str:
"""ISO-8601 UTC timestamp string."""
return _dt.datetime.now(_dt.timezone.utc).isoformat()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's an inconsistency in timestamp formatting. This function produces timestamps with a +00:00 timezone offset, while _iso_now in harness/evaluate.py uses the Z suffix. It would be best to standardize on one format for all timestamps. I'd recommend using the Z suffix, as it's more common for UTC timestamps in many systems. Also, removing microseconds can lead to cleaner timestamps if they are not required.

Suggested change
return _dt.datetime.now(_dt.timezone.utc).isoformat()
return _dt.datetime.now(_dt.timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z")

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR brings the previously gitignored src/zmlx/foundry/ module into the repo and updates supporting docs/config to enable kernel-template evaluation, dataset generation, and export workflows, alongside housekeeping changes (.gitignore cleanup, docs navigation, and version bump).

Changes:

  • Adds the Foundry module (templates, harness, sampling, reports, plugins, ops registry, export utilities) for Metal kernel variant generation/evaluation.
  • Updates documentation and repo metadata (new docs pages, README + UPSTREAM_PLAN link updates, AGENTS guidance, CHANGELOG entry).
  • Cleans .gitignore, adds benchmark repro capsules, and bumps version to 0.8.5.

Reviewed changes

Copilot reviewed 89 out of 91 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
src/zmlx/foundry/templates/swiglu/t1_unrolled.metal Adds a SwiGLU Metal template variant with vector/unroll knobs and fault-injection placeholders.
src/zmlx/foundry/templates/swiglu/t0_basic.metal Adds a basic SwiGLU Metal template with fast-math toggle and fault injection.
src/zmlx/foundry/templates/rmsnorm/t1_tgmem.metal Adds a tgmem-staging RMSNorm template variant (contains a correctness bug noted in comments).
src/zmlx/foundry/templates/rmsnorm/t0_basic.metal Adds a basic RMSNorm template with TG/UNROLL/VEC knobs (VEC currently unused; noted).
src/zmlx/foundry/templates/moe_combine/t2_row_tile.metal Adds a MoE combine template using threadgroup tiling for packed assignments.
src/zmlx/foundry/templates/moe_combine/t1_k8_unrolled.metal Adds a MoE combine template with fixed k-unroll inner loop.
src/zmlx/foundry/templates/moe_combine/t0_basic.metal Adds a baseline MoE combine template.
src/zmlx/foundry/templates/render.py Adds a small mustache-style renderer splitting header/body for mx.fast.metal_kernel.
src/zmlx/foundry/templates/init.py Adds template discovery/loading helpers.
src/zmlx/foundry/taxonomy.py Introduces Foundry core dataclasses/protocols (candidates, results, backend/op protocols).
src/zmlx/foundry/session.py Adds session directory/log management and a simple compile marker cache.
src/zmlx/foundry/scheduler.py Adds curriculum scheduler for staged op unlocking (curriculum naming mismatch noted).
src/zmlx/foundry/sampling/sampler.py Adds candidate sampling (random/coverage/mutation/mix) + elite loading.
src/zmlx/foundry/sampling/mutate.py Adds knob mutation logic with optional error injection.
src/zmlx/foundry/sampling/coverage.py Adds deterministic shape/layout coverage generators.
src/zmlx/foundry/sampling/init.py Exposes sampling API surface.
src/zmlx/foundry/reports/pareto.py Adds Pareto/best-by-p50 utilities across multiple record layouts.
src/zmlx/foundry/reports/coverage.py Adds coverage report generation (shape/layout handling issues noted).
src/zmlx/foundry/reports/init.py Exposes reports API surface.
src/zmlx/foundry/plugins/registry.py Adds plugin discovery/loaders via entry points or local module paths.
src/zmlx/foundry/plugins/protocols.py Defines plugin Protocol contracts and context dataclasses.
src/zmlx/foundry/plugins/init.py Exposes plugin API surface.
src/zmlx/foundry/ops/topk.py Adds topk reference op for Foundry registry.
src/zmlx/foundry/ops/swiglu.py Adds swiglu op spec/knobs/reference for Foundry.
src/zmlx/foundry/ops/softmax.py Adds softmax reference op.
src/zmlx/foundry/ops/scatter.py Adds scatter reference op.
src/zmlx/foundry/ops/rope.py Adds rope reference op.
src/zmlx/foundry/ops/rmsnorm.py Adds rmsnorm op spec/knobs/reference for Foundry.
src/zmlx/foundry/ops/quantize.py Adds quantize reference op (dtype/spec mismatch noted).
src/zmlx/foundry/ops/moe_topk.py Adds moe_topk reference op.
src/zmlx/foundry/ops/moe_pack.py Adds moe_pack reference op for packing assignments.
src/zmlx/foundry/ops/moe_dispatch.py Adds moe_dispatch reference op for dispatch gather.
src/zmlx/foundry/ops/moe_combine.py Adds moe_combine op + Metal templates + bytes/flops estimation.
src/zmlx/foundry/ops/layernorm.py Adds layernorm reference op.
src/zmlx/foundry/ops/kv_append.py Adds kv_append reference op.
src/zmlx/foundry/ops/grouped_gemm.py Adds grouped_gemm reference hook (NumPy per-expert matmul).
src/zmlx/foundry/ops/gather.py Adds gather reference op.
src/zmlx/foundry/ops/dequantize.py Adds dequantize reference op.
src/zmlx/foundry/ops/init.py Registers all ops into a name→instance registry.
src/zmlx/foundry/ndjson.py Adds append-only NDJSON utilities (determinism claim mismatch noted).
src/zmlx/foundry/ids.py Adds stable attempt/cache key hashing and shape class helper.
src/zmlx/foundry/harness/correctness.py Adds correctness metrics + tolerance gating (MLX ULP omission noted).
src/zmlx/foundry/harness/compile.py Adds compile wrapper with caching + error capture.
src/zmlx/foundry/harness/cache.py Adds in-process compile cache container.
src/zmlx/foundry/harness/bench.py Adds adaptive benchmark loop with percentiles/timeouts.
src/zmlx/foundry/harness/backend.py Adds MLX + Mock backends for compilation/execution.
src/zmlx/foundry/harness/init.py Exposes harness API surface.
src/zmlx/foundry/export/training.py Adds training JSONL export for successful attempts.
src/zmlx/foundry/export/init.py Exposes export API surface.
src/zmlx/foundry/init.py Adds module-level doc + __all__ surface and CLI pointers.
src/zmlx/init.py Bumps library version to 0.8.5.
pyproject.toml Bumps package version to 0.8.5.
docs/FOUNDRY.md Adds Foundry usage/workflow docs (op list inaccuracies noted).
docs/DEVELOPMENT.md Adds/renames development guide content.
configs/qwen3_1p7b_kernel_sft_lora.yaml Adds LoRA config for kernel SFT training dataset.
benchmarks/repro_capsules/qwen3_isolation_ordered_t200_r5_repB_20260211_summary.json Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/qwen3_isolation_ordered_t200_r5_repA_20260211_summary.json Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/qwen3_isolation_ordered_t1024_r5_repB_20260211_summary.json Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/qwen3_isolation_ordered_t1024_r5_repA_20260211_summary.json Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/qwen3_benchmark_vs_baseline_t200_r3_repB_20260210_summary.json Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/qwen3_benchmark_vs_baseline_t200_r3_repA_20260210_summary.json Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/qwen3_benchmark_vs_baseline_t1024_r3_repB_20260210_summary.json Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/qwen3_benchmark_vs_baseline_t1024_r3_repA_20260210_summary.json Adds benchmark capsule summary artifact.
benchmarks/repro_capsules/glm47_final_longconfirm_t1024_r5_20260211_summary.json Adds GLM long-context confirmation capsule summary.
benchmarks/repro_capsules/glm47_final_longconfirm_t1024_r5_20260211_glm_combine_fp32_no_fma.json Adds detailed GLM capsule artifact.
benchmarks/repro_capsules/glm47_final_longconfirm_t1024_r5_20260211_control_swiglu_moe.json Adds detailed GLM control capsule artifact.
benchmarks/repro_capsules/glm47_consistency_abba_t200_r5_repB_20260211_summary.json Adds GLM consistency capsule summary.
benchmarks/repro_capsules/glm47_consistency_abba_t200_r5_repA_20260211_summary.json Adds GLM consistency capsule summary.
benchmarks/repro_capsules/glm47_consistency_abba_t1024_r5_repB_20260211_summary.json Adds GLM consistency capsule summary.
benchmarks/repro_capsules/glm47_consistency_abba_t1024_r5_repA_20260211_summary.json Adds GLM consistency capsule summary.
UPSTREAM_PLAN.md Adds tracked upstream plan doc.
README.md Updates benchmark guidance + adds Foundry docs link.
CHANGELOG.md Adds 0.8.5 entry documenting benchmark artifacts and README update.
AGENTS.md Adds AI agent guidance doc for repo conventions and safety constraints.
.gitignore Cleans ignore patterns; adds ephemeral dirs and stops ignoring Foundry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +56 to +68
w_tile[tid] = (float)w[col];
}
threadgroup_barrier(mem_flags::mem_threadgroup);

#pragma unroll
for (uint u = 0; u < UNROLL; ++u) {
uint colu = colBase + tid + u * TG_SIZE;
if (colu < H) {
uint elem = base + colu;
uint loc = elem_to_loc(elem, x_shape, x_strides, x_ndim);
float xv = (float)x[loc];
float wv = w_tile[tid]; // simplistic; same tid
float outv = xv * inv * wv;
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This template caches only w[colBase + tid] into w_tile[tid] but then reuses it for colu = colBase + tid + u*TG_SIZE in the UNROLL loop. For UNROLL > 1 this applies the wrong weight to most columns and will produce incorrect RMSNorm outputs. Either remove the UNROLL loop for this template or stage/load the correct weights for each colu (e.g., load additional tiles or index weights directly by colu).

Copilot uses AI. Check for mistakes.
Comment on lines +42 to +52
def _infer_shape_class(op: str, shape: dict[str, int]) -> str:
"""Lightweight shape class string when the op registry is unavailable."""
b = shape.get("batch", 1)
s = shape.get("seq", shape.get("tokens", 1))
h = shape.get("hidden", 0)
parts = [f"b{b}", f"s{s}", f"h{h}"]
if not bool(shape.get("contiguous", True)):
parts.append("strided")
else:
parts.append("contig")
return "_".join(parts)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_infer_shape_class checks shape.get('contiguous', True), but contiguity lives in the layout dict (and the function is called with shape only). This will mislabel strided cases as contig in reports. Consider passing layout into _infer_shape_class and using layout.get('contiguous', True) (or read att['layout'] when present).

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +24
def dumps(record: dict[str, Any]) -> str:
"""Compact, deterministic-order JSON serialisation."""
return json.dumps(
record,
sort_keys=False,
ensure_ascii=False,
separators=(",", ":"),
allow_nan=False,
)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dumps() docstring claims "deterministic-order JSON" but json.dumps(..., sort_keys=False) does not guarantee key order unless every caller builds dicts in a consistent insertion order. Either set sort_keys=True or adjust the docstring to avoid promising determinism.

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +62
def compute_metrics_mlx(
mx: Any, y: Any, y_ref: Any, dtype: str
) -> tuple[float, float, float | None]:
"""Compute error metrics on-device via MLX and return python scalars."""
diff = mx.abs(y.astype(mx.float32) - y_ref.astype(mx.float32))
max_abs = float(mx.max(diff).item())
denom = mx.maximum(mx.abs(y_ref.astype(mx.float32)), 1e-8)
max_rel = float(mx.max(diff / denom).item())
ulp: float | None = None
return max_abs, max_rel, ulp
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_metrics_mlx() currently never computes ULP for float32 (always returns ulp=None), unlike the numpy path. If ULP is intended to be reported for float32 correctness, consider adding an MLX-based ULP computation (bitcast to int32) or explicitly document that ULP is only available in numpy mode.

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +48
def spec(self) -> OpSpec:
return OpSpec(
name=self.name,
kernel_class=KernelClass.QUANT,
summary="Quantize fp16/bf16 -> int8 or packed int4 (symmetric)",
inputs=["x[tokens,hidden]"],
outputs=["q[tokens,hidden] (int8) OR q_packed[tokens,hidden/2] (int4 packed)"],
op_params_schema={
"q_dtype": {"type": "str", "enum": ["int8", "int4"], "default": "int8"},
"scale": {"type": "float", "default": 0.02},
},
shape_hints={"tokens": TOKENS_LADDER, "hidden": HIDDEN_LADDER},
dtype_hints=["float16", "bfloat16"],
templates=["ref"],
)

def supported_dtypes(self) -> list[str]:
return ["float16", "bfloat16", "float32"]

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QuantizeOp.spec() advertises input dtypes as fp16/bf16 only, but supported_dtypes() includes float32. This mismatch can lead the sampler/harness to generate float32 quantize attempts that contradict the op spec and dataset expectations. Either include float32 in the spec/dtype_hints or remove it from supported_dtypes().

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +107
# Determine success -- support both Foundry and DataFoundry record layouts
res = att.get("result", {})
status = res.get("status")
correctness_ok = att.get("correctness", {}).get("ok", False)
bench_ok = att.get("bench", {}).get("ok", False)
is_ok = (status == "ok") or (correctness_ok and bench_ok)

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Success detection for DataFoundry-style records should also require build.ok (as done in reports/pareto.py and export/training.py). Currently is_ok treats (correctness.ok and bench.ok) as success even if build info is missing/false, which can skew coverage stats. Align this condition with _is_successful logic (include build.ok).

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +102
def try_claim(self, attempt_id: str) -> bool:
"""Atomically claim an attempt ID. Returns True if newly claimed."""
if attempt_id in self._existing_ids:
return False
self._existing_ids.add(attempt_id)
return True
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try_claim() is documented as "Atomically claim" but it only updates an in-memory set; across multi-process workers it's not atomic and duplicates can still be written (each worker has its own Session). Either clarify the docstring (process-local claim) or implement a real cross-process claim mechanism (e.g., file lock / atomic marker file).

Copilot uses AI. Check for mistakes.

n_test = n - n_train - n_valid
if n_test < 0:
n_test = 0
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable n_test is not used.

Copilot uses AI. Check for mistakes.
from mlx.core import metal
di = metal.device_info()
info.update(di)
except Exception:
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
except Exception:
# Log at debug level in production; here we silently skip.
continue
except Exception:
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
Add a new D-SIMD (dual SIMD group) Metal kernel for MoE gating with
64 experts (D=64, K=4). The kernel fuses softmax + bias + top-K
selection into a single GPU dispatch using 2 SIMD groups (64 threads),
replacing ~10 separate Metal dispatches per layer.

Key insight: MLX's argpartition returns top-K in ascending value order.
Matching this ordering in the kernel output is critical for
token-identical fidelity — different accumulation order in the combine
step causes cascading divergence across 40 layers.

Results on M4 Max 36GB (stock MLX, 4-bit quantized):
- 200 tokens: 152.6 → 163.1 tok/s (+7.2%), 200/200 fidelity PASS
- 500 tokens: 152.0 → 161.1 tok/s (+6.0%), 500/500 fidelity PASS

Smart defaults differentiate by K (num_experts_per_tok):
- K<=2 (LFM2-8B): fused SwiGLU + kernel combine (existing +12% win)
- K>=3 (LFM2-24B): D-SIMD gate + native combine, fused SwiGLU disabled
  (gather_qmm_swiglu causes 0.77x regression at K=4)

patch(model) works automatically — no env vars needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Hmbown Hmbown changed the title repo: track foundry module, clean .gitignore, improve navigation feat: LFM2-24B +7% decode, foundry tracking, repo cleanup Feb 24, 2026
Hmbown and others added 2 commits February 24, 2026 10:39
Merged 6 conflicted files with strategic resolution strategy:
- CHANGELOG.md: kept theirs (GLM combine-mode changelog entry)
- README.md: merged both sections (kept both LFM2-24B and GLM/Qwen3 benchmarks)
- LAB_NOTEBOOK.md: kept HEAD (experimental benchmark sections)
- pyproject.toml: merged description to mention both LFM2-8B and LFM2-24B
- moe_mlp.py: merged both variants and experimental GLM combine modes
- test_moe_fused_swiglu_gate.py: merged both test sets for GLM combine modes

All changes are complementary:
- Our LFM2-24B work + their GLM combine-mode fixes
- Our experimental variants + their default behavior fixes

Fidelity: all files clean (git diff --check passes)
- Add src/zmlx/fusion/ to git (was untracked, caused ModuleNotFoundError
  in CI macOS Metal tests)
- Exclude foundry/fusion from mypy strict checks (pre-existing type
  issues in newly tracked code)
- Fix union-attr mypy error in moe_mlp.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Hmbown Hmbown merged commit 00dfdc3 into main Feb 24, 2026
12 checks passed
@Hmbown Hmbown deleted the repo-polish/navigation-cleanup branch February 24, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants