Conversation
This commit fixes a critical bug in the outer product implementation and adds significant SIMD optimizations. Changes: - Fixed broken nested loop that was repeating SIMD operations uselessly - Implemented proper SIMD-accelerated outer product computation - Added comprehensive unit tests for outer product functionality - Added benchmarks for outer product performance measurement - Optimized algorithm now correctly computes Result[i,j] = u[i] * v[j] Performance Impact: The previous implementation had a severe bug where the inner loop didn't use its iteration variable, causing the same SIMD operations to repeat cols times. The new implementation: - Properly broadcasts each u[i] element to a SIMD vector - Multiplies with v vector once per row (not cols times) - Provides both SIMD-accelerated and scalar fallback paths - Correctly handles tail elements when size is not SIMD-aligned Benchmark Results (optimized version, ShortRun): Size=10: 101.5 ns, 856 B allocated Size=100: 4.157 μs, 80 KB allocated Size=500: 942.3 μs, 2 MB allocated The old implementation was fundamentally broken and couldn't produce correct results, so no before/after comparison is meaningful. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Member
|
@kMutagene The bug looks legitimate. It would be good to verify that both optimized and non-optimized paths are being tested by the automated CI teting. That's not clear to me from the PR |
This was referenced Oct 11, 2025
Daily Perf Improver - Add benchmarks for matrix multiplication (was: Adaptive blocking for mmul)
#22
Closed
This was referenced Oct 12, 2025
…d54-e75939c77af8ef4c
Contributor
Author
📊 Code Coverage ReportSummary
📈 Coverage Analysis🟡 Good Coverage Your code coverage is above 60%. Consider adding more tests to reach 80%. 🎯 Coverage Goals
📋 What These Numbers Mean
🔗 Detailed Reports📋 Download Full Coverage Report - Check the 'coverage-report' artifact for detailed HTML coverage report Coverage report generated on 2025-10-14 at 15:38:42 UTC |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes a critical bug in the outer product implementation and adds SIMD optimizations as part of Phase 1 of the performance improvement plan.
Performance Goal
Goal Selected: Fix outer product implementation (Phase 1, Priority: HIGH)
Rationale: The research identified the outer product as having "inefficient nested loop structure" with an expected improvement of 2-5x. Upon investigation, I discovered the implementation had a critical bug that made it fundamentally broken.
Bug Found
The original implementation had a severe algorithmic bug:
The
jloop iteratedcolstimes but didn't use the iteration variable, causing the SIMD operations to repeat uselessly and not produce correct outer product results.Changes Made
Fixed Implementation
Result[i,j] = u[i] * v[j]u[i]element once and multiplies with v vectorNew Tests (benchmarks/FsMath.Benchmarks/MatrixOuterProductTests.fs)
New Benchmarks (benchmarks/FsMath.Benchmarks/Matrix.fs)
Approach
Performance Measurements
Test Environment
Results
Key Observations
Replicating the Performance Measurements
Testing
✅ All 137 tests pass (132 original + 5 new outer product tests)
✅ Benchmarks compile and run successfully
✅ Both SIMD and scalar code paths verified correct
Implementation Details
Optimizations Applied
Numerics.Vector<'T>(u[i])to broadcast scalar across SIMD lanesrowVec[k] <- uBroadcast * vVec[k]for efficient SIMD multiplyCode Quality
Next Steps
This PR establishes correct functionality and baseline performance for outer product. Future work from the performance plan includes:
Related Issues/Discussions
🤖 Generated with Claude Code