codelion commited on
Commit
cfac466
·
verified ·
1 Parent(s): 6d66b75

card: link blog post for calibration mix instead of bare optiq.jsonl reference

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -19,7 +19,7 @@ tags:
19
 
20
  A 4-bit mixed-precision MLX quant produced by [mlx-optiq](https://mlx-optiq.com/), the sensitivity-aware quantization toolkit for Apple Silicon.
21
 
22
- A 4-bit mixed-precision MLX quant of [google/gemma-4-e4b-it](https://huggingface.co/google/gemma-4-e4b-it). Per-layer bit-widths come from a KL-divergence sensitivity pass on the bundled [`optiq.jsonl`](https://mlx-optiq.com/blog/calibration-mix) six-domain calibration mix (prose · reasoning · code · agent · tool-call · constraint-bearing instructions). Sensitive layers go to 8-bit; robust ones stay at 4-bit. The on-disk size is within ~5 % of a stock uniform 4-bit MLX quant.
23
 
24
  ## Quantization details
25
 
@@ -30,7 +30,7 @@ A 4-bit mixed-precision MLX quant of [google/gemma-4-e4b-it](https://huggingface
30
  | Layers at 4-bit (robust) | 224 |
31
  | Total quantized layers | 379 |
32
  | Group size | 64 |
33
- | Calibration mix | `optiq.jsonl` (40 samples × 6 domains) |
34
  | Reference for sensitivity | bf16 (auto-resolved; falls back to uniform-4-bit if bf16 doesn't fit) |
35
 
36
  We follow the same naming convention `llama.cpp` uses for Q4_K_M and similar mixed-precision quants: the "4-bit" label is for the predominant precision, not the weighted average. The mixed allocation is what lets this build beat stock uniform-4-bit at the same disk size. Benchmark deltas are below.
 
19
 
20
  A 4-bit mixed-precision MLX quant produced by [mlx-optiq](https://mlx-optiq.com/), the sensitivity-aware quantization toolkit for Apple Silicon.
21
 
22
+ A 4-bit mixed-precision MLX quant of [google/gemma-4-e4b-it](https://huggingface.co/google/gemma-4-e4b-it). Per-layer bit-widths come from a KL-divergence sensitivity pass on a [six-domain calibration mix](https://mlx-optiq.com/blog/calibration-mix) (prose · reasoning · code · agent · tool-call · constraint-bearing instructions). Sensitive layers go to 8-bit; robust ones stay at 4-bit. The on-disk size is within ~5 % of a stock uniform 4-bit MLX quant.
23
 
24
  ## Quantization details
25
 
 
30
  | Layers at 4-bit (robust) | 224 |
31
  | Total quantized layers | 379 |
32
  | Group size | 64 |
33
+ | Calibration mix | [six-domain mix](https://mlx-optiq.com/blog/calibration-mix) (40 samples × 6 domains) |
34
  | Reference for sensitivity | bf16 (auto-resolved; falls back to uniform-4-bit if bf16 doesn't fit) |
35
 
36
  We follow the same naming convention `llama.cpp` uses for Q4_K_M and similar mixed-precision quants: the "4-bit" label is for the predominant precision, not the weighted average. The mixed allocation is what lets this build beat stock uniform-4-bit at the same disk size. Benchmark deltas are below.