`QuantizationMode` should be algebraic instead of scalar

Previously MLX supported affine quantization only, it was fine to have `groupSize` and `bits` as separate arguments of quantization methods. However, the newly supported `mxfp4` only allows `groupSize = 32` and `bits = 4`, it'd be better to confine quantization modes with a full sum type, e.g.

```swift
enum QuantizationMode: Equatable {
  case affine(groupSize: Int, bits: Int)
  case mxfp4
}
``` 

If, in the future, MLX adds support to more quantization modes (e.g. mxfpN),  `mxfp4` can be converted into a convenience method to maintain source compatibility, i.e.

```swift
enum QuantizationMode: Equatable {
  case affine(groupSize: Int, bits: Int)
  case mxfpN(n: Int) 
}

extension QuantizationMode {
  static let mxfp4: Self = .mxfpN(n: 4)
}

mode == .mxfp4 // still works because of the Equatable conformance
```

It will be a source breaking change for all use sites of `QuantizationMode` though, but I believe the use of sum type in this case will make the API more extensible and fool-proof.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`QuantizationMode` should be algebraic instead of scalar #285

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

QuantizationMode should be algebraic instead of scalar #285

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`QuantizationMode` should be algebraic instead of scalar #285