Feature: Traditional GEMM / MatMul API

### Describe what you are looking for

## Summary

NumKong has `dot(a, b)` for vectors and for **matrix–vector** (e.g. `A` (n, d) @ `b` (d,) → (n,)), and `dots_pack` / `dots_packed` for the GEMM-style case where one side is pre-packed and we compute `A @ B.T`. There is no direct **general matrix multiplication** for two 2D arrays: `A` (m, k) @ `B` (k, n) → (m, n), i.e. a single API equivalent to NumPy’s `np.matmul` / `@` or OpenCV’s `cv2.gemm`.

## Analogous API: OpenCV `cv2.gemm`

OpenCV provides **cv2.gemm**: generalized matrix multiply `alpha * A * B + beta * C`. With `alpha=1`, `beta=0`, and `C=None`, this is standard matrix multiply `A @ B`. It accepts 2D matrices and returns 2D. We use it in an image-augmentation codebase for both small matrices and for operations that involve image-sized data (see below).

## How we use it today

### 1. Small matrices (direct `cv2.gemm`)

- **Thin Plate Spline (TPS):** Pairwise distances `cv2.gemm(points1, points2.T, 1, None, 0)` → (N, 2) @ (2, M) → (N, M); transform: `cv2.gemm(kernel_matrix, nonlinear_weights, 1, None, 0)` and `cv2.gemm(affine_terms, affine_weights, 1, None, 0)` (small matrices).
- **Stain normalization (Macenko):** `cv2.gemm(angle_to_vector, principal_eigenvectors_t, 1, None, 0)` — (2, 2) @ (2, 3) → (2, 3).

So we need general matmul for **small 2D × 2D** matrices, same semantics as `cv2.gemm(A, B, 1, None, 0)`.

### 2. "Image × small matrix" (NumPy `@` today)

In stain normalization we have `optical_density` shape **(num_pixels, 3)** (flattened image) and small matrices like `stain_colors` (3, 2). We do e.g. `optical_density @ stain_colors.T` → (num_pixels, 2). So the first operand can be **image-sized**, the second small — still general matmul `A @ B`, not element-wise image multiply.

## Request

Provide a **general matrix multiplication** API that: (1) accepts two 2D arrays `A` (m, k) and `B` (k, n), (2) returns `C` (m, n) = `A @ B`, (3) works for both small×small and large×small. Either a dedicated `matmul(A, B)` (or `gemm(A, B, alpha, C, beta)`) or generalize `dot` so that when both arguments are 2D with compatible shapes it performs full matmul instead of "Vector dimensions don't match".

## Why not only `dots_packed`?

`dots_packed(A, dots_pack(B))` gives `A @ B.T`. To get `A @ B` we have to pack `B.T` and call `dots_packed(A, dots_pack(B.T))`, which is awkward. A direct `matmul(A, B)` would match NumPy/OpenCV usage and our existing `cv2.gemm` and `optical_density @ stain_matrix` patterns.

### Can you contribute to the implementation?

- [ ] I can contribute

### Is your feature request specific to a certain interface?

It applies to everything

### Contact Details

_No response_

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Traditional GEMM / MatMul API #312

Describe what you are looking for

Summary

Analogous API: OpenCV `cv2.gemm`

How we use it today

1. Small matrices (direct `cv2.gemm`)

2. "Image × small matrix" (NumPy `@` today)

Request

Why not only `dots_packed`?

Can you contribute to the implementation?

Is your feature request specific to a certain interface?

Contact Details

Is there an existing issue for this?

Code of Conduct

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature: Traditional GEMM / MatMul API #312

Description

Describe what you are looking for

Summary

Analogous API: OpenCV cv2.gemm

How we use it today

1. Small matrices (direct cv2.gemm)

2. "Image × small matrix" (NumPy @ today)

Request

Why not only dots_packed?

Can you contribute to the implementation?

Is your feature request specific to a certain interface?

Contact Details

Is there an existing issue for this?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Analogous API: OpenCV `cv2.gemm`

1. Small matrices (direct `cv2.gemm`)

2. "Image × small matrix" (NumPy `@` today)

Why not only `dots_packed`?