Initialize JetBrains Junie 🚀 by jetbrains-junie[bot] · Pull Request #1 · dayanruben/mojo

jetbrains-junie · 2025-06-02T18:28:36Z

This PR initializes JetBrains Junie 🚀 by adding essential configuration files.

Includes:

GitHub workflow for automated execution
Dev container setup (if applicable)

Generated automatically by Junie. Review and customize as needed.

github-actions

The PR title does not conform to the '[<Project>] Title' format. Please update the PR title.

Typical [<Project>] values include:

[stdlib] — indicates a change to the Mojo standard library code
[docs] — indicates a change to the documentation

It's okay to include multiple labels on a PR that affect multiple areas of work.

Thank you for contributing to Mojo!🔥

You can also use a tool like www.regex101.com to see why your PR title fails to conform. Use ^(Revert ")?(\[\S.*\]\s?)+\s+[a-zA-Z`].* as the regex to test and Initialize JetBrains Junie 🚀 as the test string.

… (#59160) Failing on `main`: ```bash mo-opt GenericML/gpu-integration-test/GPUUnit/split.mlir --mo-to-mgp="default-device-label=gpu constant-fold=false" -o GenericML/gpu-integration-test/GPUUnit/Output/split.mlir.tmp.mlir # RUN: at line 1 + mo-opt GenericML/gpu-integration-test/GPUUnit/split.mlir '--mo-to-mgp=default-device-label=gpu constant-fold=false' -o GenericML/gpu-integration-test/GPUUnit/Output/split.mlir.tmp.mlir mt --execute --result-output-style=full GenericML/gpu-integration-test/GPUUnit/Output/split.mlir.tmp.mlir | FileCheck GenericML/gpu-integration-test/GPUUnit/split.mlir # RUN: at line 2 + mt --execute --result-output-style=full GenericML/gpu-integration-test/GPUUnit/Output/split.mlir.tmp.mlir + FileCheck GenericML/gpu-integration-test/GPUUnit/split.mlir PLEASE submit a bug report to https://github.com/modular/max/issues and include the crash backtrace. Stack dump: 0. Program arguments: mt --execute --result-output-style=full GenericML/gpu-integration-test/GPUUnit/Output/split.mlir.tmp.mlir #0 [Internal link] llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) Signals.cpp:0:0 #1 0x000064e81f2e9c59 llvm::sys::RunSignalHandlers() Signals.cpp:0:0 modular#2 0x000064e81f2ec75a SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0 modular#3 0x000072c1c7819520 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x42520) modular#4 0x000072c1c786d9fc pthread_kill (/usr/lib/x86_64-linux-gnu/libc.so.6+0x969fc) modular#5 0x000072c1c7819476 gsignal (/usr/lib/x86_64-linux-gnu/libc.so.6+0x42476) modular#6 0x000072c16bb496b2 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0 modular#7 0x000072c1c7819520 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x42520) modular#8 0x000072c0e401559a GenericML/gpu-integration-test/GPUUnit/split.mlir:23:17: error: CHECK-LABEL: expected string not found in input // CHECK-LABEL: Running 'split_inner_axis': ^ <stdin>:1:32: note: scanning from here --- Running 'split_outer_axis': ^ <stdin>:2:1: note: possible intended match here 'split_outer_axis' returned tensor<1x5xsi32> [0, 1, 2, 3, 4] ^ Input file: <stdin> Check file: GenericML/gpu-integration-test/GPUUnit/split.mlir -dump-input=help explains the following input dump. Input was: <<<<<< 1: --- Running 'split_outer_axis': label:23'0 X error: no match found 2: 'split_outer_axis' returned tensor<1x5xsi32> [0, 1, 2, 3, 4] label:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ label:23'1 ? possible intended match 3: , tensor<2x5xsi32> [5, 6, 7, 8, 9, 10, 11, 12, 13, 14] label:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4: , tensor<1x5xsi32> [15, 16, 17, 18, 19] label:23'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5: label:23'0 ~ >>>>>> -- ******************** ******************** Failed Tests (1): //GenericML/gpu-integration-test :: GPUUnit/split.mlir ``` MAX_GRAPH_API_ORIG_REV_ID: 112ab6e2db7a2e3216c863846f7fc956805e0f6a

layout code This fixes printing of parameter expression calls, which can happen in complicated type expressions, to include the parameter /values/ for the call and strip off mangling information. On the simple testcase we would get something like: ``` invalid call to 'takes4': argument #0 cannot be converted from 'HasSize[get_int[::Int]()]' to 'HasSize[4]' ``` Now we get: ``` error: invalid call to 'takes4': argument #0 cannot be converted from 'HasSize[get_int[42]()]' to 'HasSize[4]' takes4(HasSize[get_int[42]()]()) ^ ``` Notice that it tells us the parameter value (`42`) instead of the type in a verbose form (`::Int`). While this is a minor win for this testcase, this comes up a lot in layout code, where one might be confronted with something useless like: ``` invalid call to '_mha_sm90_max_prompt_len': argument #1 cannot be converted from 'TMATensorTile[KVType.dtype, tile_layout_k_major[::DType,::Int,::Int,::TensorMapSwizzle](), _tma_desc_tile_layout[::DType,::Int,::IndexList[$1, ::DType()]' to 'TMATensorTile[KVType.dtype, tile_layout_k_major[::DType,::Int,::Int,::TensorMapSwizzle](), _tma_desc_tile_layout[::DType,::Int,::IndexList[$1, ::DType()]' ``` The problem here is that the compiler is telling us exactly the wrong thing about `tile_layout_k_major` and `_tma_desc_tile_layout` which is both verbose and useless. This patch fixes this. MODULAR_ORIG_COMMIT_REV_ID: 99284e0f32d5b2596f64be5dbcc27356deab99e8

This reverts commit c911f2f48908a87f6a1db8df75d877d1d33b0880. The PR broke [logit [Internal link] for graviton devices. To reproduce, trigger the logit verification workflow on the graviton runners ``` max-engine crashed! Signal Information: Signal: 4 (SIGILL) Description: Illegal instruction Signal Code: 1 (Illegal opcode) Sending PID: -1259941636 Sending UID: 65535 Fault Address: 0xffffb4e6d0fc Process ID: 10029 Thread ID: 281469722489088 Timestamp: Thu Oct 16 06:52:46 2025 C++ stack trace: #0 0x0000ffffad0cdf18 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) Signals.cpp:0:0 #1 0x0000ffffa9b26428 developmentSignalHandler(void*) DevelopmentSignalHandler.cpp:0:0 modular#2 0x0000ffffad0cb824 llvm::sys::RunSignalHandlers() Signals.cpp:0:0 modular#3 0x0000ffffa9b26d80 captureSignalInformation(int, siginfo_t*, void*) DevelopmentSignalHandler.cpp:0:0 modular#4 0x0000ffffb61ec850 (linux-vdso.so.1+0x850) modular#5 0x0000ffffb4e6d0fc create_weights_registry (/github/home/.cache/bazel/_bazel_root/991c1318309cea4e3284840cbcc05428/execroot/_main/bazel-out/aarch64-opt/bin/SDK/integration-test/pipelines/python/verify_pipelines.runfiles/_main/SDK/lib/API/python/max/_core.cpython-312-aarch64-linux-gnu.so+0xd4d0fc) modular#6 0x0000ffffac52a960 M::WeightsRegistry::create(llvm::ArrayRef<char const*>, llvm::ArrayRef<std::byte const*>) WeightsRegistry.cpp:0:0 modular#7 0x0000ffffa84b64b4 void llvm::detail::UniqueFunctionBase<void>::CallImpl<M_weightsRegistry::$_0>(void*) weights.cpp:0:0 modular#8 0x0000ffffa9b2d910 void (anonymous namespace)::WorkQueueThread::runItemsImpl<(anonymous namespace)::WorkQueueThread::runOnThread()::$_0, (anonymous namespace)::WorkQueueThread::runOnThread()::$_1>((anonymous namespace)::WorkQueueThread::runOnThread()::$_0, (anonymous namespace)::WorkQueueThread::runOnThread()::$_1, bool, llvm::StringLiteral, llvm::StringLiteral) ThreadPoolWorkQueue.cpp:0:0 modular#9 0x0000ffffa9b2d690 (anonymous namespace)::WorkQueueThread::runOnThread() ThreadPoolWorkQueue.cpp:0:0 modular#10 0x0000ffffb3fc29cc (/lib/aarch64-linux-gnu/libstdc++.so.6+0xd29cc) modular#11 0x0000ffffb5f70398 (/lib/aarch64-linux-gnu/libc.so.6+0x80398) modular#12 0x0000ffffb5fd9e9c (/lib/aarch64-linux-gnu/libc.so.6+0xe9e9c) Host machine info: target-triple: aarch64-unknown-linux-gnu os: linux arch: neoverse-n1 cpu-model: simd-bitwidth: 128 features: aes, crc, dotprod, fp-armv8, fullfp16, lse, neon, perfmon, ras, rcpc, rdm, sha2, spe, ssbs core-count: 16 l1-cache-size: 65536 l2-cache-size: 1048576 l3-cache-size: 33554432 l4-cache-size: 0 affinities: none``` MODULAR_ORIG_COMMIT_REV_ID: b02819fb75d0831116f19e17072c5547668f2644

When you specify --data-parallel-degree 8 --max-batch-size 32 there are two ways to interpret this: 1. Each of the 8 DP replicas has a cap on batch size of 32. Across all replicas the aggregate max batch size is thus 8 * 32 = 256. 2. Each of the 8 DP replicas has a cap on batch size of 4. Across all replicas the aggregate max batch size is thus 32. Currently our code uses interpretation modular#2. However, I think we should switch to #1 for the following reasons: - Maintaining max_batch_size and max_batch_size_per_replica is very error prone. It is very easy to mix them up. - This interpretation is consistent to what vLLM does. MODULAR_ORIG_COMMIT_REV_ID: e11df40525f02153819194e90562a816c427d716

jetbrains-junie bot added 2 commits June 2, 2025 18:28

feat(junie): added .junie workflow

0e6a08c

feat(junie): added .devcontainer.json

33dd4c3

github-actions bot requested changes Jun 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialize JetBrains Junie 🚀#1

Initialize JetBrains Junie 🚀#1
jetbrains-junie[bot] wants to merge 2 commits intomainfrom
junie-init

jetbrains-junie bot commented Jun 2, 2025

Uh oh!

github-actions bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

jetbrains-junie bot commented Jun 2, 2025

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants