Open
Conversation
There was a problem hiding this comment.
The PR title does not conform to the '[<Project>] Title' format. Please update the PR title.
Typical [<Project>] values include:
[stdlib]— indicates a change to the Mojo standard library code[docs]— indicates a change to the documentation
It's okay to include multiple labels on a PR that affect multiple areas of work.
Thank you for contributing to Mojo!🔥
You can also use a tool like www.regex101.com to see why your PR title fails to conform. Use ^(Revert ")?(\[\S.*\]\s?)+\s+[a-zA-Z`].* as the regex to test and Initialize JetBrains Junie 🚀 as the test string.
dayanruben
pushed a commit
that referenced
this pull request
Jun 26, 2025
… (#59160) Failing on `main`: ```bash mo-opt GenericML/gpu-integration-test/GPUUnit/split.mlir --mo-to-mgp="default-device-label=gpu constant-fold=false" -o GenericML/gpu-integration-test/GPUUnit/Output/split.mlir.tmp.mlir # RUN: at line 1 + mo-opt GenericML/gpu-integration-test/GPUUnit/split.mlir '--mo-to-mgp=default-device-label=gpu constant-fold=false' -o GenericML/gpu-integration-test/GPUUnit/Output/split.mlir.tmp.mlir mt --execute --result-output-style=full GenericML/gpu-integration-test/GPUUnit/Output/split.mlir.tmp.mlir | FileCheck GenericML/gpu-integration-test/GPUUnit/split.mlir # RUN: at line 2 + mt --execute --result-output-style=full GenericML/gpu-integration-test/GPUUnit/Output/split.mlir.tmp.mlir + FileCheck GenericML/gpu-integration-test/GPUUnit/split.mlir PLEASE submit a bug report to https://github.com/modular/max/issues and include the crash backtrace. Stack dump: 0. Program arguments: mt --execute --result-output-style=full GenericML/gpu-integration-test/GPUUnit/Output/split.mlir.tmp.mlir #0 [Internal link] llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) Signals.cpp:0:0 #1 0x000064e81f2e9c59 llvm::sys::RunSignalHandlers() Signals.cpp:0:0 modular#2 0x000064e81f2ec75a SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0 modular#3 0x000072c1c7819520 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x42520) modular#4 0x000072c1c786d9fc pthread_kill (/usr/lib/x86_64-linux-gnu/libc.so.6+0x969fc) modular#5 0x000072c1c7819476 gsignal (/usr/lib/x86_64-linux-gnu/libc.so.6+0x42476) modular#6 0x000072c16bb496b2 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0 modular#7 0x000072c1c7819520 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x42520) modular#8 0x000072c0e401559a GenericML/gpu-integration-test/GPUUnit/split.mlir:23:17: error: CHECK-LABEL: expected string not found in input // CHECK-LABEL: Running 'split_inner_axis': ^ <stdin>:1:32: note: scanning from here --- Running 'split_outer_axis': ^ <stdin>:2:1: note: possible intended match here 'split_outer_axis' returned tensor<1x5xsi32> [0, 1, 2, 3, 4] ^ Input file: <stdin> Check file: GenericML/gpu-integration-test/GPUUnit/split.mlir -dump-input=help explains the following input dump. Input was: <<<<<< 1: --- Running 'split_outer_axis': label:23'0 X error: no match found 2: 'split_outer_axis' returned tensor<1x5xsi32> [0, 1, 2, 3, 4] label:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ label:23'1 ? possible intended match 3: , tensor<2x5xsi32> [5, 6, 7, 8, 9, 10, 11, 12, 13, 14] label:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4: , tensor<1x5xsi32> [15, 16, 17, 18, 19] label:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5: label:23'0 ~ >>>>>> -- ******************** ******************** Failed Tests (1): //GenericML/gpu-integration-test :: GPUUnit/split.mlir ``` MAX_GRAPH_API_ORIG_REV_ID: 112ab6e2db7a2e3216c863846f7fc956805e0f6a
dayanruben
pushed a commit
that referenced
this pull request
Sep 2, 2025
layout code
This fixes printing of parameter expression calls, which can happen in
complicated
type expressions, to include the parameter /values/ for the call and
strip off mangling
information. On the simple testcase we would get something like:
```
invalid call to 'takes4': argument #0 cannot be converted from 'HasSize[get_int[::Int]()]' to 'HasSize[4]'
```
Now we get:
```
error: invalid call to 'takes4': argument #0 cannot be converted from 'HasSize[get_int[42]()]' to 'HasSize[4]'
takes4(HasSize[get_int[42]()]())
^
```
Notice that it tells us the parameter value (`42`) instead of the type
in a verbose
form (`::Int`). While this is a minor win for this testcase, this comes
up a lot
in layout code, where one might be confronted with something useless
like:
```
invalid call to '_mha_sm90_max_prompt_len': argument #1 cannot be converted from 'TMATensorTile[KVType.dtype, tile_layout_k_major[::DType,::Int,::Int,::TensorMapSwizzle](), _tma_desc_tile_layout[::DType,::Int,::IndexList[$1, ::DType()]' to 'TMATensorTile[KVType.dtype, tile_layout_k_major[::DType,::Int,::Int,::TensorMapSwizzle](), _tma_desc_tile_layout[::DType,::Int,::IndexList[$1, ::DType()]'
```
The problem here is that the compiler is telling us exactly the wrong
thing
about `tile_layout_k_major` and `_tma_desc_tile_layout` which is both
verbose and useless. This patch fixes this.
MODULAR_ORIG_COMMIT_REV_ID: 99284e0f32d5b2596f64be5dbcc27356deab99e8
dayanruben
pushed a commit
that referenced
this pull request
Oct 17, 2025
This reverts commit c911f2f48908a87f6a1db8df75d877d1d33b0880.
The PR broke [logit
[Internal link]
for graviton devices. To reproduce, trigger the logit verification
workflow on the graviton runners
```
max-engine crashed!
Signal Information:
Signal: 4 (SIGILL)
Description: Illegal instruction
Signal Code: 1 (Illegal opcode)
Sending PID: -1259941636
Sending UID: 65535
Fault Address: 0xffffb4e6d0fc
Process ID: 10029
Thread ID: 281469722489088
Timestamp: Thu Oct 16 06:52:46 2025
C++ stack trace:
#0 0x0000ffffad0cdf18 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) Signals.cpp:0:0
#1 0x0000ffffa9b26428 developmentSignalHandler(void*) DevelopmentSignalHandler.cpp:0:0
modular#2 0x0000ffffad0cb824 llvm::sys::RunSignalHandlers() Signals.cpp:0:0
modular#3 0x0000ffffa9b26d80 captureSignalInformation(int, siginfo_t*, void*) DevelopmentSignalHandler.cpp:0:0
modular#4 0x0000ffffb61ec850 (linux-vdso.so.1+0x850)
modular#5 0x0000ffffb4e6d0fc create_weights_registry (/github/home/.cache/bazel/_bazel_root/991c1318309cea4e3284840cbcc05428/execroot/_main/bazel-out/aarch64-opt/bin/SDK/integration-test/pipelines/python/verify_pipelines.runfiles/_main/SDK/lib/API/python/max/_core.cpython-312-aarch64-linux-gnu.so+0xd4d0fc)
modular#6 0x0000ffffac52a960 M::WeightsRegistry::create(llvm::ArrayRef<char const*>, llvm::ArrayRef<std::byte const*>) WeightsRegistry.cpp:0:0
modular#7 0x0000ffffa84b64b4 void llvm::detail::UniqueFunctionBase<void>::CallImpl<M_weightsRegistry::$_0>(void*) weights.cpp:0:0
modular#8 0x0000ffffa9b2d910 void (anonymous namespace)::WorkQueueThread::runItemsImpl<(anonymous namespace)::WorkQueueThread::runOnThread()::$_0, (anonymous namespace)::WorkQueueThread::runOnThread()::$_1>((anonymous namespace)::WorkQueueThread::runOnThread()::$_0, (anonymous namespace)::WorkQueueThread::runOnThread()::$_1, bool, llvm::StringLiteral, llvm::StringLiteral) ThreadPoolWorkQueue.cpp:0:0
modular#9 0x0000ffffa9b2d690 (anonymous namespace)::WorkQueueThread::runOnThread() ThreadPoolWorkQueue.cpp:0:0
modular#10 0x0000ffffb3fc29cc (/lib/aarch64-linux-gnu/libstdc++.so.6+0xd29cc)
modular#11 0x0000ffffb5f70398 (/lib/aarch64-linux-gnu/libc.so.6+0x80398)
modular#12 0x0000ffffb5fd9e9c (/lib/aarch64-linux-gnu/libc.so.6+0xe9e9c)
Host machine info:
target-triple: aarch64-unknown-linux-gnu
os: linux
arch: neoverse-n1
cpu-model:
simd-bitwidth: 128
features: aes, crc, dotprod, fp-armv8, fullfp16, lse, neon, perfmon, ras, rcpc, rdm, sha2, spe, ssbs
core-count: 16
l1-cache-size: 65536
l2-cache-size: 1048576
l3-cache-size: 33554432
l4-cache-size: 0
affinities: none```
MODULAR_ORIG_COMMIT_REV_ID: b02819fb75d0831116f19e17072c5547668f2644
dayanruben
pushed a commit
that referenced
this pull request
Nov 27, 2025
When you specify --data-parallel-degree 8 --max-batch-size 32 there are two ways to interpret this: 1. Each of the 8 DP replicas has a cap on batch size of 32. Across all replicas the aggregate max batch size is thus 8 * 32 = 256. 2. Each of the 8 DP replicas has a cap on batch size of 4. Across all replicas the aggregate max batch size is thus 32. Currently our code uses interpretation modular#2. However, I think we should switch to #1 for the following reasons: - Maintaining max_batch_size and max_batch_size_per_replica is very error prone. It is very easy to mix them up. - This interpretation is consistent to what vLLM does. MODULAR_ORIG_COMMIT_REV_ID: e11df40525f02153819194e90562a816c427d716
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR initializes JetBrains Junie 🚀 by adding essential configuration files.
Includes:
Generated automatically by Junie. Review and customize as needed.