codelion commited on
Commit
13a5b72
·
verified ·
1 Parent(s): e60cd68

v0.1.0 sweep: requant with 5.0-BPW ceiling, fresh benchmark numbers

Browse files
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  library_name: mlx
3
- license: apache-2.0
4
  pipeline_tag: text-generation
5
  base_model: google/gemma-4-e4b-it
6
  tags:
@@ -12,39 +12,32 @@ tags:
12
  - optiq
13
  - apple-silicon
14
  - text-generation
15
- - gemma4
16
  ---
17
 
18
- # gemma-4-e4b-it-OptiQ-4bit
19
 
20
- > Optimized for Apple Silicon with [mlx-optiq](https://mlx-optiq.pages.dev/) sensitivity-aware mixed-precision quantization, reusable at inference, fine-tuning, and serving time.
21
 
22
- This is a mixed-precision quantized version of [google/gemma-4-e4b-it](https://huggingface.co/google/gemma-4-e4b-it) in MLX format. Instead of uniform 4-bit across every layer, optiq measures each layer's sensitivity via KL divergence on calibration data and assigns **per-layer** bit-widths (some layers at 8-bit, the rest at 4-bit) at the same average bits-per-weight. Same size, higher quality.
23
 
24
- The `optiq_metadata.json` sidecar ships in the repo; it's what `mlx-optiq` reads to drive sensitivity-aware LoRA fine-tuning, mixed-precision KV serving, and hot-swap adapter routing.
25
-
26
- **Multimodal stripping (default):** the base is published as a multimodal foundation model, but this optiq release ships the language stack only. Vision/audio config + token metadata are dropped at conversion time so the artifact is smaller, leaves more RAM for KV cache + LoRA, and runs cleanly on lower-spec Apple Silicon with longer contexts. If you need the vision tower, re-convert from the base with:
27
-
28
- ```bash
29
- optiq convert google/gemma-4-e4b-it --target-bpw 4.5 --keep-unused-modalities -o ./mmodel
30
- ```
31
-
32
-
33
- ## Quantization Details
34
 
35
  | Property | Value |
36
  |---|---|
37
- | Target BPW | 4.5 |
38
- | Achieved BPW | 4.50 |
39
- | Layers at 8-bit (sensitive) | 149 |
40
- | Layers at 4-bit (robust) | 444 |
41
- | Total quantized layers | 593 |
42
  | Group size | 64 |
43
- | `model_type` | `gemma4_text` |
 
 
 
44
 
45
  ## Usage
46
 
47
- ### Basic (works with stock `mlx-lm`)
48
 
49
  ```bash
50
  pip install mlx-lm
@@ -59,73 +52,43 @@ response = generate(
59
  prompt="Explain quantum computing in simple terms.",
60
  max_tokens=200,
61
  )
62
- print(response)
63
  ```
64
 
65
- ### Unlock the full stack with `mlx-optiq`
66
-
67
- Installing [mlx-optiq](https://pypi.org/project/mlx-optiq/) turns this model from a static checkpoint into a deployment-ready base:
68
 
69
  ```bash
70
  pip install mlx-optiq
71
  ```
72
 
73
- **Mixed-precision KV-cache serving** (+40–62% decode speedup at 64k context on Qwen3.5 2B/4B/9B vs fp16 KV on M3 Max):
74
-
75
- ```bash
76
- # One-time per-layer KV sensitivity pass
77
- optiq kv-cache mlx-community/gemma-4-e4b-it-OptiQ-4bit --target-bits 4.5 -o ./kv_cache
78
-
79
- # OpenAI-compatible server on :8080
80
- optiq serve \
81
- --kv-config ./kv_cache/kv_config.json \
82
- --model mlx-community/gemma-4-e4b-it-OptiQ-4bit \
83
- --max-tokens 32768 --temp 0.6 --top-p 0.95
84
- ```
85
-
86
- **Sensitivity-aware LoRA fine-tuning** — layers optiq kept at 8-bit (more sensitive) get 2× the adapter rank of layers at 4-bit, at the same base budget:
87
-
88
- ```bash
89
- optiq lora train mlx-community/gemma-4-e4b-it-OptiQ-4bit \
90
- --data ./my_data \
91
- --rank 8 --rank-scaling by_bits \
92
- --iters 1000 -o ./my_adapter
93
- ```
94
-
95
- **Hot-swap adapters** — mount N adapters on one base, switch per request without reloading the model (adapter id via HF repo or local path, auto-downloaded):
96
-
97
- ```bash
98
- optiq serve \
99
- --model mlx-community/gemma-4-e4b-it-OptiQ-4bit \
100
- --adapter ./my_adapter
101
- ```
102
-
103
- Full documentation: [mlx-optiq.pages.dev](https://mlx-optiq.pages.dev/)
104
 
105
  ## Benchmarks
106
 
107
- **GSM8K** (200 samples, 3-shot chain-of-thought):
108
 
109
- | Model | GSM8K Accuracy |
110
  |---|---:|
111
- | **This (optiq mixed 4.5 BPW)** | **see [Results page](https://mlx-optiq.pages.dev/results.html)** |
112
- | Uniform 4-bit baseline | (documented on Results) |
113
-
114
- See [mlx-optiq.pages.dev/results](https://mlx-optiq.pages.dev/results.html) for full methodology and per-model numbers.
 
 
 
 
 
 
 
115
 
116
  ## Links
117
 
118
- - **Documentation:** https://mlx-optiq.pages.dev/
119
- - **PyPI:** https://pypi.org/project/mlx-optiq/
120
- - **Article:** [Not All Layers Are Equal](https://x.com/latent_node/status/2028412948167942334?s=20)
121
- - **Base model:** [google/gemma-4-e4b-it](https://huggingface.co/google/gemma-4-e4b-it)
122
-
123
- ## Credits
124
-
125
- - **Quantization method:** [mlx-optiq](https://pypi.org/project/mlx-optiq/)
126
  - **Base model:** [google/gemma-4-e4b-it](https://huggingface.co/google/gemma-4-e4b-it)
127
- - **Runtime:** [MLX](https://github.com/ml-explore/mlx)
128
 
129
  ## License
130
 
131
- Apache 2.0 (inherits from base model).
 
1
  ---
2
  library_name: mlx
3
+ license: gemma
4
  pipeline_tag: text-generation
5
  base_model: google/gemma-4-e4b-it
6
  tags:
 
12
  - optiq
13
  - apple-silicon
14
  - text-generation
15
+ - gemma-4
16
  ---
17
 
18
+ # mlx-community/gemma-4-e4b-it-OptiQ-4bit
19
 
20
+ A 4-bit mixed-precision MLX quant produced by [mlx-optiq](https://mlx-optiq.com/), the sensitivity-aware quantization toolkit for Apple Silicon.
21
 
22
+ A 4-bit mixed-precision MLX quant of [google/gemma-4-e4b-it](https://huggingface.co/google/gemma-4-e4b-it). Per-layer bit-widths come from a KL-divergence sensitivity pass on the bundled [`optiq.jsonl`](https://mlx-optiq.com/blog/calibration-mix) five-domain calibration mix (prose · reasoning · code · agent · tool-call). Sensitive layers go to 8-bit; robust ones stay at 4-bit. The on-disk size is within ~5 % of a stock uniform 4-bit MLX quant.
23
 
24
+ ## Quantization details
 
 
 
 
 
 
 
 
 
25
 
26
  | Property | Value |
27
  |---|---|
28
+ | Predominant precision | 4-bit |
29
+ | Layers at 8-bit (sensitive) | 155 |
30
+ | Layers at 4-bit (robust) | 224 |
31
+ | Total quantized layers | 379 |
 
32
  | Group size | 64 |
33
+ | Calibration mix | `optiq.jsonl` (32 samples × 5 domains) |
34
+ | Reference for sensitivity | bf16 (auto-resolved; falls back to uniform-4-bit if bf16 doesn't fit) |
35
+
36
+ We follow the same naming convention `llama.cpp` uses for Q4_K_M and similar mixed-precision quants: the "4-bit" label is for the predominant precision, not the weighted average. The mixed allocation is what lets this build beat stock uniform-4-bit at the same disk size. Benchmark deltas are below.
37
 
38
  ## Usage
39
 
40
+ Load it with `mlx-lm` and use it as usual:
41
 
42
  ```bash
43
  pip install mlx-lm
 
52
  prompt="Explain quantum computing in simple terms.",
53
  max_tokens=200,
54
  )
 
55
  ```
56
 
57
+ For more (mixed-precision KV-cache serving, sensitivity-aware LoRA fine-tuning, OpenAI + Anthropic-compatible inference server, hot-swap mounted adapters, sandboxed Python execution for agent workflows), install [`mlx-optiq`](https://mlx-optiq.com/):
 
 
58
 
59
  ```bash
60
  pip install mlx-optiq
61
  ```
62
 
63
+ See the [Gemma-4 family guide](https://mlx-optiq.com/docs/gemma-4) on [mlx-optiq.com](https://mlx-optiq.com/) for sampling defaults, training recipes, and family-specific caveats.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
  ## Benchmarks
66
 
67
+ Five-metric suite that drives the [Capability Score](https://mlx-optiq.com/blog/eval-framework):
68
 
69
+ | Metric | Score |
70
  |---|---:|
71
+ | MMLU (5-shot, 1000 samples) | 58.8% |
72
+ | GSM8K (1000 samples, 3-shot CoT) | 77.8% |
73
+ | IFEval (full set, strict) | 70.6% |
74
+ | IFEval (full set, loose) | 70.8% |
75
+ | BFCL-V3 simple (200 single-turn calls) | 69.0% |
76
+ | HumanEval (164 problems, pass@1) | 76.8% |
77
+ | **Capability Score** (mean of the 5 benchmarks above) | **70.6** |
78
+ | KL vs bf16 reference (mean / p95) | 0.2755 / 1.3460 |
79
+ | On-disk size | 6.1 GB |
80
+
81
+ The Capability Score is the simple unweighted mean of the five benchmarks. Every metric gets one equal vote. Disk size is reported next to it as an honest second axis instead of being folded into the score. See the [eval-framework writeup](https://mlx-optiq.com/blog/eval-framework) for the full methodology.
82
 
83
  ## Links
84
 
85
+ - **Project website:** [mlx-optiq.com](https://mlx-optiq.com/)
86
+ - **Gemma-4 family guide:** [mlx-optiq.com/docs/gemma-4](https://mlx-optiq.com/docs/gemma-4)
87
+ - **PyPI:** [pypi.org/project/mlx-optiq](https://pypi.org/project/mlx-optiq/)
88
+ - **Calibration mix:** [mlx-optiq.com/blog/calibration-mix](https://mlx-optiq.com/blog/calibration-mix)
89
+ - **Eval framework:** [mlx-optiq.com/blog/eval-framework](https://mlx-optiq.com/blog/eval-framework)
 
 
 
90
  - **Base model:** [google/gemma-4-e4b-it](https://huggingface.co/google/gemma-4-e4b-it)
 
91
 
92
  ## License
93
 
94
+ Gemma license (inherits from base model). See https://ai.google.dev/gemma/terms for the terms of use.
chat_template.jinja CHANGED
@@ -1,9 +1,9 @@
1
- {%- macro format_parameters(properties, required) -%}
2
  {%- set standard_keys = ['description', 'type', 'properties', 'required', 'nullable'] -%}
3
  {%- set ns = namespace(found_first=false) -%}
4
  {%- for key, value in properties | dictsort -%}
5
  {%- set add_comma = false -%}
6
- {%- if key not in standard_keys -%}
7
  {%- if ns.found_first %},{% endif -%}
8
  {%- set ns.found_first = true -%}
9
  {{ key }}:{
@@ -65,7 +65,7 @@
65
  {%- elif value is mapping -%}
66
  {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
67
  properties:{
68
- {{- format_parameters(value, value['required'] | default([])) -}}
69
  }
70
  {%- endif -%}
71
  {%- if value['required'] -%}
@@ -178,18 +178,21 @@
178
  {#- Handle System/Tool Definitions Block -#}
179
  {%- if (enable_thinking is defined and enable_thinking) or tools or messages[0]['role'] in ['system', 'developer'] -%}
180
  {{- '<|turn>system\n' -}}
181
-
182
  {#- Inject Thinking token at the very top of the FIRST system turn -#}
183
  {%- if enable_thinking is defined and enable_thinking -%}
184
  {{- '<|think|>\n' -}}
185
  {%- set ns.prev_message_type = 'think' -%}
186
  {%- endif -%}
187
-
188
  {%- if messages[0]['role'] in ['system', 'developer'] -%}
189
- {{- messages[0]['content'] | trim -}}
 
 
 
 
 
 
190
  {%- set loop_messages = messages[1:] -%}
191
  {%- endif -%}
192
-
193
  {%- if tools -%}
194
  {%- for tool in tools %}
195
  {{- '<|tool>' -}}
@@ -198,7 +201,6 @@
198
  {%- endfor %}
199
  {%- set ns.prev_message_type = 'tool' -%}
200
  {%- endif -%}
201
-
202
  {{- '<turn|>\n' -}}
203
  {%- endif %}
204
 
@@ -302,6 +304,7 @@
302
  {%- endfor -%}
303
  {%- endif -%}
304
 
 
305
  {%- if message['content'] is string -%}
306
  {%- if role == 'model' -%}
307
  {{- strip_thinking(message['content']) -}}
@@ -328,10 +331,14 @@
328
  {%- endif -%}
329
  {%- endfor -%}
330
  {%- endif -%}
 
 
 
 
331
 
332
  {%- if ns.prev_message_type == 'tool_call' and not ns_tr_out.flag -%}
333
  {{- '<|tool_response>' -}}
334
- {%- elif not (ns_tr_out.flag and not message.get('content')) -%}
335
  {{- '<turn|>\n' -}}
336
  {%- endif -%}
337
  {%- endif -%}
 
1
+ {%- macro format_parameters(properties, required, filter_keys=false) -%}
2
  {%- set standard_keys = ['description', 'type', 'properties', 'required', 'nullable'] -%}
3
  {%- set ns = namespace(found_first=false) -%}
4
  {%- for key, value in properties | dictsort -%}
5
  {%- set add_comma = false -%}
6
+ {%- if not filter_keys or key not in standard_keys -%}
7
  {%- if ns.found_first %},{% endif -%}
8
  {%- set ns.found_first = true -%}
9
  {{ key }}:{
 
65
  {%- elif value is mapping -%}
66
  {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
67
  properties:{
68
+ {{- format_parameters(value, value['required'] | default([]), filter_keys=true) -}}
69
  }
70
  {%- endif -%}
71
  {%- if value['required'] -%}
 
178
  {#- Handle System/Tool Definitions Block -#}
179
  {%- if (enable_thinking is defined and enable_thinking) or tools or messages[0]['role'] in ['system', 'developer'] -%}
180
  {{- '<|turn>system\n' -}}
 
181
  {#- Inject Thinking token at the very top of the FIRST system turn -#}
182
  {%- if enable_thinking is defined and enable_thinking -%}
183
  {{- '<|think|>\n' -}}
184
  {%- set ns.prev_message_type = 'think' -%}
185
  {%- endif -%}
 
186
  {%- if messages[0]['role'] in ['system', 'developer'] -%}
187
+ {%- if messages[0]['content'] is string -%}
188
+ {{- messages[0]['content'] | trim -}}
189
+ {%- elif messages[0]['content'] is sequence -%}
190
+ {%- for item in messages[0]['content'] -%}
191
+ {{- item['text'] | trim + ' '-}}
192
+ {%- endfor -%}
193
+ {%- endif -%}
194
  {%- set loop_messages = messages[1:] -%}
195
  {%- endif -%}
 
196
  {%- if tools -%}
197
  {%- for tool in tools %}
198
  {{- '<|tool>' -}}
 
201
  {%- endfor %}
202
  {%- set ns.prev_message_type = 'tool' -%}
203
  {%- endif -%}
 
204
  {{- '<turn|>\n' -}}
205
  {%- endif %}
206
 
 
304
  {%- endfor -%}
305
  {%- endif -%}
306
 
307
+ {%- set captured_content -%}
308
  {%- if message['content'] is string -%}
309
  {%- if role == 'model' -%}
310
  {{- strip_thinking(message['content']) -}}
 
331
  {%- endif -%}
332
  {%- endfor -%}
333
  {%- endif -%}
334
+ {%- endset -%}
335
+
336
+ {{- captured_content -}}
337
+ {%- set has_content = captured_content | trim | length > 0 -%}
338
 
339
  {%- if ns.prev_message_type == 'tool_call' and not ns_tr_out.flag -%}
340
  {{- '<|tool_response>' -}}
341
+ {%- elif not (ns_tr_out.flag and not has_content) -%}
342
  {{- '<turn|>\n' -}}
343
  {%- endif -%}
344
  {%- endif -%}
config.json CHANGED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:21d0d431c44f39e600ef6dda30d4e69afdd27fbc80c8139b73b277b53fe6bb46
3
- size 3297104547
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d239262a51b1795d1556ba0d0bdea955c126108d053249d2fd4c1e1584100fd
3
+ size 3523881390
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:40dea78dff54032cffb29fef4fb5b6a29b84ce11447663eaaaff983731339af4
3
- size 3023979841
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da846c36ac065e0c8f558cd27286757490fc345b6613ac2789750c0d760f59a3
3
+ size 3010217360
model.safetensors.index.json CHANGED
The diff for this file is too large to render. See raw diff
 
optiq_metadata.json CHANGED
@@ -1,2387 +1,1533 @@
1
  {
2
  "method": "optiq_mixed_precision",
3
- "target_bpw": 4.5,
4
- "achieved_bpw": 4.4995443518886935,
5
- "n_high_bits": 149,
6
- "n_low_bits": 444,
 
 
7
  "threshold": 0.0,
8
  "per_layer": {
9
- "model.language_model.layers.0.self_attn.q_proj": {
10
- "bits": 8,
11
  "group_size": 64
12
  },
13
- "model.language_model.layers.0.self_attn.k_proj": {
14
  "bits": 8,
15
  "group_size": 64
16
  },
17
- "model.language_model.layers.0.self_attn.v_proj": {
18
  "bits": 8,
19
  "group_size": 64
20
  },
21
- "model.language_model.layers.0.self_attn.o_proj": {
22
  "bits": 8,
23
  "group_size": 64
24
  },
25
- "model.language_model.layers.0.mlp.gate_proj": {
26
  "bits": 8,
27
  "group_size": 64
28
  },
29
- "model.language_model.layers.0.mlp.up_proj": {
30
  "bits": 8,
31
  "group_size": 64
32
  },
33
- "model.language_model.layers.0.mlp.down_proj": {
34
  "bits": 8,
35
  "group_size": 64
36
  },
37
- "model.language_model.layers.0.per_layer_input_gate": {
38
- "bits": 8,
39
  "group_size": 64
40
  },
41
- "model.language_model.layers.0.per_layer_projection": {
42
- "bits": 8,
43
  "group_size": 64
44
  },
45
- "model.language_model.layers.1.self_attn.q_proj": {
46
  "bits": 8,
47
  "group_size": 64
48
  },
49
- "model.language_model.layers.1.self_attn.k_proj": {
50
  "bits": 8,
51
  "group_size": 64
52
  },
53
- "model.language_model.layers.1.self_attn.v_proj": {
54
- "bits": 8,
55
  "group_size": 64
56
  },
57
- "model.language_model.layers.1.self_attn.o_proj": {
 
 
 
 
58
  "bits": 8,
59
  "group_size": 64
60
  },
61
- "model.language_model.layers.1.mlp.gate_proj": {
62
  "bits": 4,
63
  "group_size": 64
64
  },
65
- "model.language_model.layers.1.mlp.up_proj": {
66
  "bits": 4,
67
  "group_size": 64
68
  },
69
- "model.language_model.layers.1.mlp.down_proj": {
70
- "bits": 8,
71
  "group_size": 64
72
  },
73
- "model.language_model.layers.1.per_layer_input_gate": {
74
- "bits": 8,
75
  "group_size": 64
76
  },
77
- "model.language_model.layers.1.per_layer_projection": {
78
- "bits": 8,
79
  "group_size": 64
80
  },
81
- "model.language_model.layers.2.self_attn.q_proj": {
82
  "bits": 8,
83
  "group_size": 64
84
  },
85
- "model.language_model.layers.2.self_attn.k_proj": {
86
- "bits": 8,
87
  "group_size": 64
88
  },
89
- "model.language_model.layers.2.self_attn.v_proj": {
90
- "bits": 8,
91
  "group_size": 64
92
  },
93
- "model.language_model.layers.2.self_attn.o_proj": {
94
  "bits": 8,
95
  "group_size": 64
96
  },
97
- "model.language_model.layers.2.mlp.gate_proj": {
98
- "bits": 8,
99
  "group_size": 64
100
  },
101
- "model.language_model.layers.2.mlp.up_proj": {
102
  "bits": 4,
103
  "group_size": 64
104
  },
105
- "model.language_model.layers.2.mlp.down_proj": {
106
- "bits": 8,
107
  "group_size": 64
108
  },
109
- "model.language_model.layers.2.per_layer_input_gate": {
110
- "bits": 8,
 
 
 
 
111
  "group_size": 64
112
  },
113
- "model.language_model.layers.2.per_layer_projection": {
114
  "bits": 8,
115
  "group_size": 64
116
  },
117
- "model.language_model.layers.3.self_attn.q_proj": {
118
  "bits": 4,
119
  "group_size": 64
120
  },
121
- "model.language_model.layers.3.self_attn.k_proj": {
122
- "bits": 8,
123
  "group_size": 64
124
  },
125
- "model.language_model.layers.3.self_attn.v_proj": {
126
  "bits": 8,
127
  "group_size": 64
128
  },
129
- "model.language_model.layers.3.self_attn.o_proj": {
130
- "bits": 8,
131
  "group_size": 64
132
  },
133
- "model.language_model.layers.3.mlp.gate_proj": {
134
- "bits": 4,
135
  "group_size": 64
136
  },
137
- "model.language_model.layers.3.mlp.up_proj": {
138
  "bits": 4,
139
  "group_size": 64
140
  },
141
- "model.language_model.layers.3.mlp.down_proj": {
142
  "bits": 4,
143
  "group_size": 64
144
  },
145
- "model.language_model.layers.3.per_layer_input_gate": {
146
- "bits": 8,
147
  "group_size": 64
148
  },
149
- "model.language_model.layers.3.per_layer_projection": {
150
  "bits": 8,
151
  "group_size": 64
152
  },
153
- "model.language_model.layers.4.self_attn.q_proj": {
154
  "bits": 4,
155
  "group_size": 64
156
  },
157
- "model.language_model.layers.4.self_attn.k_proj": {
158
- "bits": 8,
159
  "group_size": 64
160
  },
161
- "model.language_model.layers.4.self_attn.v_proj": {
162
  "bits": 8,
163
  "group_size": 64
164
  },
165
- "model.language_model.layers.4.self_attn.o_proj": {
166
- "bits": 8,
167
  "group_size": 64
168
  },
169
- "model.language_model.layers.4.mlp.gate_proj": {
170
- "bits": 8,
171
  "group_size": 64
172
  },
173
- "model.language_model.layers.4.mlp.up_proj": {
174
  "bits": 4,
175
  "group_size": 64
176
  },
177
- "model.language_model.layers.4.mlp.down_proj": {
178
- "bits": 8,
179
  "group_size": 64
180
  },
181
- "model.language_model.layers.4.per_layer_input_gate": {
182
- "bits": 8,
183
  "group_size": 64
184
  },
185
- "model.language_model.layers.4.per_layer_projection": {
186
  "bits": 8,
187
  "group_size": 64
188
  },
189
- "model.language_model.layers.5.self_attn.q_proj": {
190
- "bits": 8,
191
  "group_size": 64
192
  },
193
- "model.language_model.layers.5.self_attn.k_proj": {
194
- "bits": 8,
195
  "group_size": 64
196
  },
197
- "model.language_model.layers.5.self_attn.v_proj": {
198
  "bits": 8,
199
  "group_size": 64
200
  },
201
- "model.language_model.layers.5.self_attn.o_proj": {
202
- "bits": 8,
203
  "group_size": 64
204
  },
205
- "model.language_model.layers.5.mlp.gate_proj": {
206
  "bits": 4,
207
  "group_size": 64
208
  },
209
- "model.language_model.layers.5.mlp.up_proj": {
210
  "bits": 4,
211
  "group_size": 64
212
  },
213
- "model.language_model.layers.5.mlp.down_proj": {
214
  "bits": 4,
215
  "group_size": 64
216
  },
217
- "model.language_model.layers.5.per_layer_input_gate": {
218
- "bits": 8,
219
  "group_size": 64
220
  },
221
- "model.language_model.layers.5.per_layer_projection": {
222
  "bits": 8,
223
  "group_size": 64
224
  },
225
- "model.language_model.layers.6.self_attn.q_proj": {
226
  "bits": 4,
227
  "group_size": 64
228
  },
229
- "model.language_model.layers.6.self_attn.k_proj": {
230
- "bits": 8,
231
  "group_size": 64
232
  },
233
- "model.language_model.layers.6.self_attn.v_proj": {
234
- "bits": 8,
235
  "group_size": 64
236
  },
237
- "model.language_model.layers.6.self_attn.o_proj": {
238
  "bits": 4,
239
  "group_size": 64
240
  },
241
- "model.language_model.layers.6.mlp.gate_proj": {
242
- "bits": 4,
243
  "group_size": 64
244
  },
245
- "model.language_model.layers.6.mlp.up_proj": {
246
  "bits": 4,
247
  "group_size": 64
248
  },
249
- "model.language_model.layers.6.mlp.down_proj": {
250
  "bits": 4,
251
  "group_size": 64
252
  },
253
- "model.language_model.layers.6.per_layer_input_gate": {
254
  "bits": 8,
255
  "group_size": 64
256
  },
257
- "model.language_model.layers.6.per_layer_projection": {
258
  "bits": 8,
259
  "group_size": 64
260
  },
261
- "model.language_model.layers.7.self_attn.q_proj": {
262
  "bits": 4,
263
  "group_size": 64
264
  },
265
- "model.language_model.layers.7.self_attn.k_proj": {
266
- "bits": 8,
267
  "group_size": 64
268
  },
269
- "model.language_model.layers.7.self_attn.v_proj": {
270
  "bits": 8,
271
  "group_size": 64
272
  },
273
- "model.language_model.layers.7.self_attn.o_proj": {
274
  "bits": 4,
275
  "group_size": 64
276
  },
277
- "model.language_model.layers.7.mlp.gate_proj": {
278
  "bits": 4,
279
  "group_size": 64
280
  },
281
- "model.language_model.layers.7.mlp.up_proj": {
282
  "bits": 4,
283
  "group_size": 64
284
  },
285
- "model.language_model.layers.7.mlp.down_proj": {
286
  "bits": 4,
287
  "group_size": 64
288
  },
289
- "model.language_model.layers.7.per_layer_input_gate": {
290
- "bits": 8,
291
  "group_size": 64
292
  },
293
- "model.language_model.layers.7.per_layer_projection": {
294
  "bits": 8,
295
  "group_size": 64
296
  },
297
- "model.language_model.layers.8.self_attn.q_proj": {
298
  "bits": 4,
299
  "group_size": 64
300
  },
301
- "model.language_model.layers.8.self_attn.k_proj": {
302
- "bits": 8,
303
  "group_size": 64
304
  },
305
- "model.language_model.layers.8.self_attn.v_proj": {
306
  "bits": 8,
307
  "group_size": 64
308
  },
309
- "model.language_model.layers.8.self_attn.o_proj": {
310
  "bits": 4,
311
  "group_size": 64
312
  },
313
- "model.language_model.layers.8.mlp.gate_proj": {
314
  "bits": 4,
315
  "group_size": 64
316
  },
317
- "model.language_model.layers.8.mlp.up_proj": {
318
  "bits": 4,
319
  "group_size": 64
320
  },
321
- "model.language_model.layers.8.mlp.down_proj": {
322
  "bits": 4,
323
  "group_size": 64
324
  },
325
- "model.language_model.layers.8.per_layer_input_gate": {
326
- "bits": 8,
327
  "group_size": 64
328
  },
329
- "model.language_model.layers.8.per_layer_projection": {
330
  "bits": 8,
331
  "group_size": 64
332
  },
333
- "model.language_model.layers.9.self_attn.q_proj": {
334
  "bits": 4,
335
  "group_size": 64
336
  },
337
- "model.language_model.layers.9.self_attn.k_proj": {
338
  "bits": 4,
339
  "group_size": 64
340
  },
341
- "model.language_model.layers.9.self_attn.v_proj": {
342
  "bits": 8,
343
  "group_size": 64
344
  },
345
- "model.language_model.layers.9.self_attn.o_proj": {
 
 
 
 
346
  "bits": 8,
347
  "group_size": 64
348
  },
349
- "model.language_model.layers.9.mlp.gate_proj": {
350
  "bits": 4,
351
  "group_size": 64
352
  },
353
- "model.language_model.layers.9.mlp.up_proj": {
354
  "bits": 4,
355
  "group_size": 64
356
  },
357
- "model.language_model.layers.9.mlp.down_proj": {
358
  "bits": 4,
359
  "group_size": 64
360
  },
361
- "model.language_model.layers.9.per_layer_input_gate": {
362
  "bits": 8,
363
  "group_size": 64
364
  },
365
- "model.language_model.layers.9.per_layer_projection": {
366
- "bits": 8,
367
  "group_size": 64
368
  },
369
- "model.language_model.layers.10.self_attn.q_proj": {
370
  "bits": 4,
371
  "group_size": 64
372
  },
373
- "model.language_model.layers.10.self_attn.k_proj": {
374
  "bits": 8,
375
  "group_size": 64
376
  },
377
- "model.language_model.layers.10.self_attn.v_proj": {
378
- "bits": 8,
379
  "group_size": 64
380
  },
381
- "model.language_model.layers.10.self_attn.o_proj": {
382
- "bits": 8,
383
  "group_size": 64
384
  },
385
- "model.language_model.layers.10.mlp.gate_proj": {
386
- "bits": 8,
387
  "group_size": 64
388
  },
389
- "model.language_model.layers.10.mlp.up_proj": {
390
  "bits": 4,
391
  "group_size": 64
392
  },
393
- "model.language_model.layers.10.mlp.down_proj": {
394
- "bits": 8,
395
  "group_size": 64
396
  },
397
- "model.language_model.layers.10.per_layer_input_gate": {
398
  "bits": 8,
399
  "group_size": 64
400
  },
401
- "model.language_model.layers.10.per_layer_projection": {
402
- "bits": 8,
403
  "group_size": 64
404
  },
405
- "model.language_model.layers.11.self_attn.q_proj": {
406
  "bits": 4,
407
  "group_size": 64
408
  },
409
- "model.language_model.layers.11.self_attn.k_proj": {
410
  "bits": 8,
411
  "group_size": 64
412
  },
413
- "model.language_model.layers.11.self_attn.v_proj": {
414
- "bits": 8,
415
  "group_size": 64
416
  },
417
- "model.language_model.layers.11.self_attn.o_proj": {
418
- "bits": 8,
419
  "group_size": 64
420
  },
421
- "model.language_model.layers.11.mlp.gate_proj": {
422
  "bits": 4,
423
  "group_size": 64
424
  },
425
- "model.language_model.layers.11.mlp.up_proj": {
426
  "bits": 4,
427
  "group_size": 64
428
  },
429
- "model.language_model.layers.11.mlp.down_proj": {
430
  "bits": 4,
431
  "group_size": 64
432
  },
433
- "model.language_model.layers.11.per_layer_input_gate": {
434
  "bits": 8,
435
  "group_size": 64
436
  },
437
- "model.language_model.layers.11.per_layer_projection": {
438
- "bits": 8,
439
  "group_size": 64
440
  },
441
- "model.language_model.layers.12.self_attn.q_proj": {
442
  "bits": 4,
443
  "group_size": 64
444
  },
445
- "model.language_model.layers.12.self_attn.k_proj": {
446
  "bits": 4,
447
  "group_size": 64
448
  },
449
- "model.language_model.layers.12.self_attn.v_proj": {
450
- "bits": 8,
451
  "group_size": 64
452
  },
453
- "model.language_model.layers.12.self_attn.o_proj": {
454
  "bits": 8,
455
  "group_size": 64
456
  },
457
- "model.language_model.layers.12.mlp.gate_proj": {
458
- "bits": 4,
459
- "group_size": 64
460
- },
461
- "model.language_model.layers.12.mlp.up_proj": {
462
  "bits": 4,
463
  "group_size": 64
464
  },
465
- "model.language_model.layers.12.mlp.down_proj": {
466
  "bits": 4,
467
  "group_size": 64
468
  },
469
- "model.language_model.layers.12.per_layer_input_gate": {
470
  "bits": 8,
471
  "group_size": 64
472
  },
473
- "model.language_model.layers.12.per_layer_projection": {
474
  "bits": 8,
475
  "group_size": 64
476
  },
477
- "model.language_model.layers.13.self_attn.q_proj": {
478
  "bits": 4,
479
  "group_size": 64
480
  },
481
- "model.language_model.layers.13.self_attn.k_proj": {
482
- "bits": 8,
483
  "group_size": 64
484
  },
485
- "model.language_model.layers.13.self_attn.v_proj": {
486
- "bits": 8,
487
  "group_size": 64
488
  },
489
- "model.language_model.layers.13.self_attn.o_proj": {
490
- "bits": 8,
491
  "group_size": 64
492
  },
493
- "model.language_model.layers.13.mlp.gate_proj": {
494
  "bits": 4,
495
  "group_size": 64
496
  },
497
- "model.language_model.layers.13.mlp.up_proj": {
498
  "bits": 4,
499
  "group_size": 64
500
  },
501
- "model.language_model.layers.13.mlp.down_proj": {
502
  "bits": 4,
503
  "group_size": 64
504
  },
505
- "model.language_model.layers.13.per_layer_input_gate": {
506
  "bits": 8,
507
  "group_size": 64
508
  },
509
- "model.language_model.layers.13.per_layer_projection": {
510
  "bits": 8,
511
  "group_size": 64
512
  },
513
- "model.language_model.layers.14.self_attn.q_proj": {
514
  "bits": 4,
515
  "group_size": 64
516
  },
517
- "model.language_model.layers.14.self_attn.k_proj": {
518
- "bits": 8,
519
  "group_size": 64
520
  },
521
- "model.language_model.layers.14.self_attn.v_proj": {
522
  "bits": 8,
523
  "group_size": 64
524
  },
525
- "model.language_model.layers.14.self_attn.o_proj": {
526
- "bits": 8,
527
  "group_size": 64
528
  },
529
- "model.language_model.layers.14.mlp.gate_proj": {
530
  "bits": 4,
531
  "group_size": 64
532
  },
533
- "model.language_model.layers.14.mlp.up_proj": {
534
  "bits": 4,
535
  "group_size": 64
536
  },
537
- "model.language_model.layers.14.mlp.down_proj": {
538
  "bits": 4,
539
  "group_size": 64
540
  },
541
- "model.language_model.layers.14.per_layer_input_gate": {
542
- "bits": 8,
543
  "group_size": 64
544
  },
545
- "model.language_model.layers.14.per_layer_projection": {
546
  "bits": 8,
547
  "group_size": 64
548
  },
549
- "model.language_model.layers.15.self_attn.q_proj": {
550
  "bits": 4,
551
  "group_size": 64
552
  },
553
- "model.language_model.layers.15.self_attn.k_proj": {
554
  "bits": 8,
555
  "group_size": 64
556
  },
557
- "model.language_model.layers.15.self_attn.v_proj": {
558
- "bits": 8,
559
  "group_size": 64
560
  },
561
- "model.language_model.layers.15.self_attn.o_proj": {
562
- "bits": 8,
563
  "group_size": 64
564
  },
565
- "model.language_model.layers.15.mlp.gate_proj": {
566
- "bits": 4,
567
  "group_size": 64
568
  },
569
- "model.language_model.layers.15.mlp.up_proj": {
570
  "bits": 4,
571
  "group_size": 64
572
  },
573
- "model.language_model.layers.15.mlp.down_proj": {
574
  "bits": 4,
575
  "group_size": 64
576
  },
577
- "model.language_model.layers.15.per_layer_input_gate": {
578
- "bits": 8,
579
  "group_size": 64
580
  },
581
- "model.language_model.layers.15.per_layer_projection": {
582
  "bits": 8,
583
  "group_size": 64
584
  },
585
- "model.language_model.layers.16.self_attn.q_proj": {
586
- "bits": 8,
587
  "group_size": 64
588
  },
589
- "model.language_model.layers.16.self_attn.k_proj": {
590
- "bits": 8,
591
  "group_size": 64
592
  },
593
- "model.language_model.layers.16.self_attn.v_proj": {
594
  "bits": 8,
595
  "group_size": 64
596
  },
597
- "model.language_model.layers.16.self_attn.o_proj": {
598
- "bits": 8,
599
  "group_size": 64
600
  },
601
- "model.language_model.layers.16.mlp.gate_proj": {
602
  "bits": 4,
603
  "group_size": 64
604
  },
605
- "model.language_model.layers.16.mlp.up_proj": {
606
  "bits": 4,
607
  "group_size": 64
608
  },
609
- "model.language_model.layers.16.mlp.down_proj": {
610
  "bits": 4,
611
  "group_size": 64
612
  },
613
- "model.language_model.layers.16.per_layer_input_gate": {
614
- "bits": 8,
615
  "group_size": 64
616
  },
617
- "model.language_model.layers.16.per_layer_projection": {
618
  "bits": 8,
619
  "group_size": 64
620
  },
621
- "model.language_model.layers.17.self_attn.q_proj": {
622
  "bits": 4,
623
  "group_size": 64
624
  },
625
- "model.language_model.layers.17.self_attn.k_proj": {
626
- "bits": 8,
627
  "group_size": 64
628
  },
629
- "model.language_model.layers.17.self_attn.v_proj": {
630
- "bits": 8,
631
  "group_size": 64
632
  },
633
- "model.language_model.layers.17.self_attn.o_proj": {
634
  "bits": 8,
635
  "group_size": 64
636
  },
637
- "model.language_model.layers.17.mlp.gate_proj": {
638
  "bits": 4,
639
  "group_size": 64
640
  },
641
- "model.language_model.layers.17.mlp.up_proj": {
642
  "bits": 4,
643
  "group_size": 64
644
  },
645
- "model.language_model.layers.17.mlp.down_proj": {
646
  "bits": 4,
647
  "group_size": 64
648
  },
649
- "model.language_model.layers.17.per_layer_input_gate": {
650
- "bits": 8,
651
- "group_size": 64
652
- },
653
- "model.language_model.layers.17.per_layer_projection": {
654
- "bits": 8,
655
- "group_size": 64
656
- },
657
- "model.language_model.layers.18.self_attn.q_proj": {
658
  "bits": 4,
659
  "group_size": 64
660
  },
661
- "model.language_model.layers.18.self_attn.k_proj": {
662
- "bits": 8,
663
- "group_size": 64
664
- },
665
- "model.language_model.layers.18.self_attn.v_proj": {
666
  "bits": 8,
667
  "group_size": 64
668
  },
669
- "model.language_model.layers.18.self_attn.o_proj": {
670
  "bits": 4,
671
  "group_size": 64
672
  },
673
- "model.language_model.layers.18.mlp.gate_proj": {
674
  "bits": 4,
675
  "group_size": 64
676
  },
677
- "model.language_model.layers.18.mlp.up_proj": {
678
  "bits": 4,
679
  "group_size": 64
680
  },
681
- "model.language_model.layers.18.mlp.down_proj": {
682
  "bits": 4,
683
  "group_size": 64
684
  },
685
- "model.language_model.layers.18.per_layer_input_gate": {
686
  "bits": 8,
687
  "group_size": 64
688
  },
689
- "model.language_model.layers.18.per_layer_projection": {
690
  "bits": 8,
691
  "group_size": 64
692
  },
693
- "model.language_model.layers.19.self_attn.q_proj": {
694
- "bits": 4,
695
  "group_size": 64
696
  },
697
- "model.language_model.layers.19.self_attn.k_proj": {
698
  "bits": 4,
699
  "group_size": 64
700
  },
701
- "model.language_model.layers.19.self_attn.v_proj": {
702
  "bits": 8,
703
  "group_size": 64
704
  },
705
- "model.language_model.layers.19.self_attn.o_proj": {
706
  "bits": 4,
707
  "group_size": 64
708
  },
709
- "model.language_model.layers.19.mlp.gate_proj": {
710
  "bits": 4,
711
  "group_size": 64
712
  },
713
- "model.language_model.layers.19.mlp.up_proj": {
714
  "bits": 4,
715
  "group_size": 64
716
  },
717
- "model.language_model.layers.19.mlp.down_proj": {
718
  "bits": 4,
719
  "group_size": 64
720
  },
721
- "model.language_model.layers.19.per_layer_input_gate": {
722
- "bits": 8,
723
  "group_size": 64
724
  },
725
- "model.language_model.layers.19.per_layer_projection": {
726
  "bits": 8,
727
  "group_size": 64
728
  },
729
- "model.language_model.layers.20.self_attn.q_proj": {
730
- "bits": 4,
731
  "group_size": 64
732
  },
733
- "model.language_model.layers.20.self_attn.k_proj": {
734
  "bits": 4,
735
  "group_size": 64
736
  },
737
- "model.language_model.layers.20.self_attn.v_proj": {
738
  "bits": 8,
739
  "group_size": 64
740
  },
741
- "model.language_model.layers.20.self_attn.o_proj": {
742
  "bits": 4,
743
  "group_size": 64
744
  },
745
- "model.language_model.layers.20.mlp.gate_proj": {
746
  "bits": 4,
747
  "group_size": 64
748
  },
749
- "model.language_model.layers.20.mlp.up_proj": {
750
  "bits": 4,
751
  "group_size": 64
752
  },
753
- "model.language_model.layers.20.mlp.down_proj": {
754
  "bits": 4,
755
  "group_size": 64
756
  },
757
- "model.language_model.layers.20.per_layer_input_gate": {
758
  "bits": 8,
759
  "group_size": 64
760
  },
761
- "model.language_model.layers.20.per_layer_projection": {
762
  "bits": 8,
763
  "group_size": 64
764
  },
765
- "model.language_model.layers.21.self_attn.q_proj": {
766
  "bits": 4,
767
  "group_size": 64
768
  },
769
- "model.language_model.layers.21.self_attn.k_proj": {
770
- "bits": 8,
771
  "group_size": 64
772
  },
773
- "model.language_model.layers.21.self_attn.v_proj": {
774
  "bits": 8,
775
  "group_size": 64
776
  },
777
- "model.language_model.layers.21.self_attn.o_proj": {
778
- "bits": 8,
779
  "group_size": 64
780
  },
781
- "model.language_model.layers.21.mlp.gate_proj": {
782
  "bits": 4,
783
  "group_size": 64
784
  },
785
- "model.language_model.layers.21.mlp.up_proj": {
786
- "bits": 4,
787
  "group_size": 64
788
  },
789
- "model.language_model.layers.21.mlp.down_proj": {
790
  "bits": 4,
791
  "group_size": 64
792
  },
793
- "model.language_model.layers.21.per_layer_input_gate": {
794
  "bits": 8,
795
  "group_size": 64
796
  },
797
- "model.language_model.layers.21.per_layer_projection": {
798
  "bits": 8,
799
  "group_size": 64
800
  },
801
- "model.language_model.layers.22.self_attn.q_proj": {
802
  "bits": 4,
803
  "group_size": 64
804
  },
805
- "model.language_model.layers.22.self_attn.k_proj": {
806
- "bits": 8,
807
  "group_size": 64
808
  },
809
- "model.language_model.layers.22.self_attn.v_proj": {
810
  "bits": 8,
811
  "group_size": 64
812
  },
813
- "model.language_model.layers.22.self_attn.o_proj": {
814
  "bits": 8,
815
  "group_size": 64
816
  },
817
- "model.language_model.layers.22.mlp.gate_proj": {
818
  "bits": 4,
819
  "group_size": 64
820
  },
821
- "model.language_model.layers.22.mlp.up_proj": {
822
  "bits": 4,
823
  "group_size": 64
824
  },
825
- "model.language_model.layers.22.mlp.down_proj": {
826
  "bits": 4,
827
  "group_size": 64
828
  },
829
- "model.language_model.layers.22.per_layer_input_gate": {
830
  "bits": 8,
831
  "group_size": 64
832
  },
833
- "model.language_model.layers.22.per_layer_projection": {
834
  "bits": 8,
835
  "group_size": 64
836
  },
837
- "model.language_model.layers.23.self_attn.q_proj": {
838
  "bits": 4,
839
  "group_size": 64
840
  },
841
- "model.language_model.layers.23.self_attn.k_proj": {
842
- "bits": 8,
843
  "group_size": 64
844
  },
845
- "model.language_model.layers.23.self_attn.v_proj": {
846
  "bits": 8,
847
  "group_size": 64
848
  },
849
- "model.language_model.layers.23.self_attn.o_proj": {
850
- "bits": 4,
851
  "group_size": 64
852
  },
853
- "model.language_model.layers.23.mlp.gate_proj": {
854
  "bits": 4,
855
  "group_size": 64
856
  },
857
- "model.language_model.layers.23.mlp.up_proj": {
858
  "bits": 4,
859
  "group_size": 64
860
  },
861
- "model.language_model.layers.23.mlp.down_proj": {
862
  "bits": 4,
863
  "group_size": 64
864
  },
865
- "model.language_model.layers.23.per_layer_input_gate": {
866
  "bits": 4,
867
  "group_size": 64
868
  },
869
- "model.language_model.layers.23.per_layer_projection": {
870
  "bits": 8,
871
  "group_size": 64
872
  },
873
- "model.language_model.layers.24.self_attn.q_proj": {
874
- "bits": 4,
875
- "group_size": 64
876
- },
877
- "model.language_model.layers.24.self_attn.o_proj": {
878
- "bits": 4,
879
- "group_size": 64
880
- },
881
- "model.language_model.layers.24.mlp.gate_proj": {
882
  "bits": 4,
883
  "group_size": 64
884
  },
885
- "model.language_model.layers.24.mlp.up_proj": {
886
  "bits": 4,
887
  "group_size": 64
888
  },
889
- "model.language_model.layers.24.mlp.down_proj": {
890
- "bits": 4,
891
  "group_size": 64
892
  },
893
- "model.language_model.layers.24.per_layer_input_gate": {
894
  "bits": 4,
895
  "group_size": 64
896
  },
897
- "model.language_model.layers.24.per_layer_projection": {
898
  "bits": 8,
899
  "group_size": 64
900
  },
901
- "model.language_model.layers.25.self_attn.q_proj": {
902
  "bits": 4,
903
  "group_size": 64
904
  },
905
- "model.language_model.layers.25.self_attn.o_proj": {
906
  "bits": 4,
907
  "group_size": 64
908
  },
909
- "model.language_model.layers.25.mlp.gate_proj": {
910
- "bits": 4,
911
  "group_size": 64
912
  },
913
- "model.language_model.layers.25.mlp.up_proj": {
914
- "bits": 4,
915
  "group_size": 64
916
  },
917
- "model.language_model.layers.25.mlp.down_proj": {
918
- "bits": 4,
919
  "group_size": 64
920
  },
921
- "model.language_model.layers.25.per_layer_input_gate": {
922
  "bits": 4,
923
  "group_size": 64
924
  },
925
- "model.language_model.layers.25.per_layer_projection": {
926
  "bits": 8,
927
  "group_size": 64
928
  },
929
- "model.language_model.layers.26.self_attn.q_proj": {
930
- "bits": 4,
931
  "group_size": 64
932
  },
933
- "model.language_model.layers.26.self_attn.o_proj": {
934
  "bits": 4,
935
  "group_size": 64
936
  },
937
- "model.language_model.layers.26.mlp.gate_proj": {
938
  "bits": 4,
939
  "group_size": 64
940
  },
941
- "model.language_model.layers.26.mlp.up_proj": {
942
  "bits": 4,
943
  "group_size": 64
944
  },
945
- "model.language_model.layers.26.mlp.down_proj": {
946
- "bits": 4,
947
  "group_size": 64
948
  },
949
- "model.language_model.layers.26.per_layer_input_gate": {
950
- "bits": 4,
951
  "group_size": 64
952
  },
953
- "model.language_model.layers.26.per_layer_projection": {
954
  "bits": 8,
955
  "group_size": 64
956
  },
957
- "model.language_model.layers.27.self_attn.q_proj": {
958
  "bits": 4,
959
  "group_size": 64
960
  },
961
- "model.language_model.layers.27.self_attn.o_proj": {
962
- "bits": 4,
963
  "group_size": 64
964
  },
965
- "model.language_model.layers.27.mlp.gate_proj": {
966
- "bits": 4,
967
  "group_size": 64
968
  },
969
- "model.language_model.layers.27.mlp.up_proj": {
970
  "bits": 4,
971
  "group_size": 64
972
  },
973
- "model.language_model.layers.27.mlp.down_proj": {
974
  "bits": 4,
975
  "group_size": 64
976
  },
977
- "model.language_model.layers.27.per_layer_input_gate": {
978
  "bits": 4,
979
  "group_size": 64
980
  },
981
- "model.language_model.layers.27.per_layer_projection": {
982
  "bits": 8,
983
  "group_size": 64
984
  },
985
- "model.language_model.layers.28.self_attn.q_proj": {
986
- "bits": 4,
987
  "group_size": 64
988
  },
989
- "model.language_model.layers.28.self_attn.o_proj": {
990
  "bits": 4,
991
  "group_size": 64
992
  },
993
- "model.language_model.layers.28.mlp.gate_proj": {
994
  "bits": 4,
995
  "group_size": 64
996
  },
997
- "model.language_model.layers.28.mlp.up_proj": {
998
  "bits": 4,
999
  "group_size": 64
1000
  },
1001
- "model.language_model.layers.28.mlp.down_proj": {
1002
- "bits": 4,
1003
  "group_size": 64
1004
  },
1005
- "model.language_model.layers.28.per_layer_input_gate": {
1006
- "bits": 8,
1007
  "group_size": 64
1008
  },
1009
- "model.language_model.layers.28.per_layer_projection": {
1010
  "bits": 8,
1011
  "group_size": 64
1012
  },
1013
- "model.language_model.layers.29.self_attn.q_proj": {
1014
  "bits": 4,
1015
  "group_size": 64
1016
  },
1017
- "model.language_model.layers.29.self_attn.o_proj": {
1018
- "bits": 4,
1019
  "group_size": 64
1020
  },
1021
- "model.language_model.layers.29.mlp.gate_proj": {
1022
  "bits": 4,
1023
  "group_size": 64
1024
  },
1025
- "model.language_model.layers.29.mlp.up_proj": {
1026
  "bits": 4,
1027
  "group_size": 64
1028
  },
1029
- "model.language_model.layers.29.mlp.down_proj": {
1030
  "bits": 4,
1031
  "group_size": 64
1032
  },
1033
- "model.language_model.layers.29.per_layer_input_gate": {
1034
- "bits": 4,
1035
  "group_size": 64
1036
  },
1037
- "model.language_model.layers.29.per_layer_projection": {
1038
  "bits": 8,
1039
  "group_size": 64
1040
  },
1041
- "model.language_model.layers.30.self_attn.q_proj": {
1042
  "bits": 4,
1043
  "group_size": 64
1044
  },
1045
- "model.language_model.layers.30.self_attn.o_proj": {
1046
  "bits": 4,
1047
  "group_size": 64
1048
  },
1049
- "model.language_model.layers.30.mlp.gate_proj": {
1050
  "bits": 4,
1051
  "group_size": 64
1052
  },
1053
- "model.language_model.layers.30.mlp.up_proj": {
1054
- "bits": 4,
1055
  "group_size": 64
1056
  },
1057
- "model.language_model.layers.30.mlp.down_proj": {
1058
- "bits": 4,
 
 
 
 
1059
  "group_size": 64
1060
  },
1061
- "model.language_model.layers.30.per_layer_input_gate": {
1062
  "bits": 4,
1063
  "group_size": 64
1064
  },
1065
- "model.language_model.layers.30.per_layer_projection": {
1066
  "bits": 8,
1067
  "group_size": 64
1068
  },
1069
- "model.language_model.layers.31.self_attn.q_proj": {
1070
- "bits": 4,
1071
  "group_size": 64
1072
  },
1073
- "model.language_model.layers.31.self_attn.o_proj": {
1074
  "bits": 4,
1075
  "group_size": 64
1076
  },
1077
- "model.language_model.layers.31.mlp.gate_proj": {
1078
  "bits": 4,
1079
  "group_size": 64
1080
  },
1081
- "model.language_model.layers.31.mlp.up_proj": {
1082
  "bits": 4,
1083
  "group_size": 64
1084
  },
1085
- "model.language_model.layers.31.mlp.down_proj": {
1086
  "bits": 4,
1087
  "group_size": 64
1088
  },
1089
- "model.language_model.layers.31.per_layer_input_gate": {
1090
- "bits": 4,
1091
  "group_size": 64
1092
  },
1093
- "model.language_model.layers.31.per_layer_projection": {
1094
  "bits": 8,
1095
  "group_size": 64
1096
  },
1097
- "model.language_model.layers.32.self_attn.q_proj": {
1098
  "bits": 4,
1099
  "group_size": 64
1100
  },
1101
- "model.language_model.layers.32.self_attn.o_proj": {
1102
- "bits": 4,
1103
  "group_size": 64
1104
  },
1105
- "model.language_model.layers.32.mlp.gate_proj": {
1106
  "bits": 4,
1107
  "group_size": 64
1108
  },
1109
- "model.language_model.layers.32.mlp.up_proj": {
1110
  "bits": 4,
1111
  "group_size": 64
1112
  },
1113
- "model.language_model.layers.32.mlp.down_proj": {
1114
  "bits": 4,
1115
  "group_size": 64
1116
  },
1117
- "model.language_model.layers.32.per_layer_input_gate": {
1118
  "bits": 4,
1119
  "group_size": 64
1120
  },
1121
- "model.language_model.layers.32.per_layer_projection": {
1122
  "bits": 8,
1123
  "group_size": 64
1124
  },
1125
- "model.language_model.layers.33.self_attn.q_proj": {
1126
- "bits": 4,
1127
- "group_size": 64
1128
- },
1129
- "model.language_model.layers.33.self_attn.o_proj": {
1130
- "bits": 4,
1131
- "group_size": 64
1132
- },
1133
- "model.language_model.layers.33.mlp.gate_proj": {
1134
- "bits": 4,
1135
  "group_size": 64
1136
  },
1137
- "model.language_model.layers.33.mlp.up_proj": {
1138
- "bits": 4,
1139
  "group_size": 64
1140
  },
1141
- "model.language_model.layers.33.mlp.down_proj": {
1142
  "bits": 4,
1143
  "group_size": 64
1144
  },
1145
- "model.language_model.layers.33.per_layer_input_gate": {
1146
- "bits": 4,
1147
  "group_size": 64
1148
  },
1149
- "model.language_model.layers.33.per_layer_projection": {
1150
  "bits": 8,
1151
  "group_size": 64
1152
  },
1153
- "model.language_model.layers.34.self_attn.q_proj": {
1154
  "bits": 4,
1155
  "group_size": 64
1156
  },
1157
- "model.language_model.layers.34.self_attn.o_proj": {
1158
  "bits": 4,
1159
  "group_size": 64
1160
  },
1161
- "model.language_model.layers.34.mlp.gate_proj": {
1162
  "bits": 4,
1163
  "group_size": 64
1164
  },
1165
- "model.language_model.layers.34.mlp.up_proj": {
1166
- "bits": 4,
1167
  "group_size": 64
1168
  },
1169
- "model.language_model.layers.34.mlp.down_proj": {
1170
- "bits": 4,
1171
  "group_size": 64
1172
  },
1173
- "model.language_model.layers.34.per_layer_input_gate": {
1174
- "bits": 4,
1175
  "group_size": 64
1176
  },
1177
- "model.language_model.layers.34.per_layer_projection": {
1178
  "bits": 8,
1179
  "group_size": 64
1180
  },
1181
- "model.language_model.layers.35.self_attn.q_proj": {
1182
- "bits": 4,
1183
  "group_size": 64
1184
  },
1185
- "model.language_model.layers.35.self_attn.o_proj": {
1186
- "bits": 4,
1187
  "group_size": 64
1188
  },
1189
- "model.language_model.layers.35.mlp.gate_proj": {
1190
  "bits": 4,
1191
  "group_size": 64
1192
  },
1193
- "model.language_model.layers.35.mlp.up_proj": {
1194
  "bits": 4,
1195
  "group_size": 64
1196
  },
1197
- "model.language_model.layers.35.mlp.down_proj": {
1198
  "bits": 4,
1199
  "group_size": 64
1200
  },
1201
- "model.language_model.layers.35.per_layer_input_gate": {
1202
  "bits": 4,
1203
  "group_size": 64
1204
  },
1205
- "model.language_model.layers.35.per_layer_projection": {
1206
  "bits": 8,
1207
  "group_size": 64
1208
  },
1209
- "model.language_model.layers.36.self_attn.q_proj": {
1210
  "bits": 4,
1211
  "group_size": 64
1212
  },
1213
- "model.language_model.layers.36.self_attn.o_proj": {
1214
- "bits": 4,
1215
  "group_size": 64
1216
  },
1217
- "model.language_model.layers.36.mlp.gate_proj": {
1218
- "bits": 4,
1219
  "group_size": 64
1220
  },
1221
- "model.language_model.layers.36.mlp.up_proj": {
1222
- "bits": 4,
1223
  "group_size": 64
1224
  },
1225
- "model.language_model.layers.36.mlp.down_proj": {
1226
- "bits": 4,
1227
  "group_size": 64
1228
  },
1229
- "model.language_model.layers.36.per_layer_input_gate": {
1230
  "bits": 4,
1231
  "group_size": 64
1232
  },
1233
- "model.language_model.layers.36.per_layer_projection": {
1234
  "bits": 8,
1235
  "group_size": 64
1236
  },
1237
- "model.language_model.layers.37.self_attn.q_proj": {
1238
  "bits": 4,
1239
  "group_size": 64
1240
  },
1241
- "model.language_model.layers.37.self_attn.o_proj": {
1242
- "bits": 4,
1243
  "group_size": 64
1244
  },
1245
- "model.language_model.layers.37.mlp.gate_proj": {
1246
  "bits": 4,
1247
  "group_size": 64
1248
  },
1249
- "model.language_model.layers.37.mlp.up_proj": {
1250
  "bits": 4,
1251
  "group_size": 64
1252
  },
1253
- "model.language_model.layers.37.mlp.down_proj": {
1254
- "bits": 4,
1255
  "group_size": 64
1256
  },
1257
- "model.language_model.layers.37.per_layer_input_gate": {
 
 
 
 
1258
  "bits": 4,
1259
  "group_size": 64
1260
  },
1261
- "model.language_model.layers.37.per_layer_projection": {
1262
  "bits": 8,
1263
  "group_size": 64
1264
  },
1265
- "model.language_model.layers.38.self_attn.q_proj": {
1266
  "bits": 4,
1267
  "group_size": 64
1268
  },
1269
- "model.language_model.layers.38.self_attn.o_proj": {
1270
  "bits": 4,
1271
  "group_size": 64
1272
  },
1273
- "model.language_model.layers.38.mlp.gate_proj": {
1274
- "bits": 4,
1275
  "group_size": 64
1276
  },
1277
- "model.language_model.layers.38.mlp.up_proj": {
1278
- "bits": 4,
1279
  "group_size": 64
1280
  },
1281
- "model.language_model.layers.38.mlp.down_proj": {
1282
  "bits": 4,
1283
  "group_size": 64
1284
  },
1285
- "model.language_model.layers.38.per_layer_input_gate": {
1286
  "bits": 4,
1287
  "group_size": 64
1288
  },
1289
- "model.language_model.layers.38.per_layer_projection": {
1290
  "bits": 8,
1291
  "group_size": 64
1292
  },
1293
- "model.language_model.layers.39.self_attn.q_proj": {
1294
- "bits": 4,
1295
- "group_size": 64
1296
- },
1297
- "model.language_model.layers.39.self_attn.o_proj": {
1298
- "bits": 4,
1299
  "group_size": 64
1300
  },
1301
- "model.language_model.layers.39.mlp.gate_proj": {
1302
  "bits": 4,
1303
  "group_size": 64
1304
  },
1305
- "model.language_model.layers.39.mlp.up_proj": {
1306
- "bits": 4,
1307
  "group_size": 64
1308
  },
1309
- "model.language_model.layers.39.mlp.down_proj": {
1310
  "bits": 4,
1311
  "group_size": 64
1312
  },
1313
- "model.language_model.layers.39.per_layer_input_gate": {
1314
- "bits": 4,
1315
  "group_size": 64
1316
  },
1317
- "model.language_model.layers.39.per_layer_projection": {
1318
  "bits": 8,
1319
  "group_size": 64
1320
  },
1321
- "model.language_model.layers.40.self_attn.q_proj": {
1322
  "bits": 4,
1323
  "group_size": 64
1324
  },
1325
- "model.language_model.layers.40.self_attn.o_proj": {
1326
- "bits": 4,
1327
  "group_size": 64
1328
  },
1329
- "model.language_model.layers.40.mlp.gate_proj": {
1330
- "bits": 4,
1331
  "group_size": 64
1332
  },
1333
- "model.language_model.layers.40.mlp.up_proj": {
1334
  "bits": 4,
1335
  "group_size": 64
1336
  },
1337
- "model.language_model.layers.40.mlp.down_proj": {
1338
  "bits": 4,
1339
  "group_size": 64
1340
  },
1341
- "model.language_model.layers.40.per_layer_input_gate": {
1342
  "bits": 4,
1343
  "group_size": 64
1344
  },
1345
- "model.language_model.layers.40.per_layer_projection": {
1346
  "bits": 8,
1347
  "group_size": 64
1348
  },
1349
- "model.language_model.layers.41.self_attn.q_proj": {
1350
  "bits": 8,
1351
  "group_size": 64
1352
  },
1353
- "model.language_model.layers.41.self_attn.o_proj": {
1354
  "bits": 8,
1355
  "group_size": 64
1356
  },
1357
- "model.language_model.layers.41.mlp.gate_proj": {
1358
- "bits": 8,
1359
- "group_size": 64
1360
- },
1361
- "model.language_model.layers.41.mlp.up_proj": {
1362
- "bits": 8,
1363
- "group_size": 64
1364
- },
1365
- "model.language_model.layers.41.mlp.down_proj": {
1366
- "bits": 8,
1367
- "group_size": 64
1368
- },
1369
- "model.language_model.layers.41.per_layer_input_gate": {
1370
- "bits": 8,
1371
- "group_size": 64
1372
- },
1373
- "model.language_model.layers.41.per_layer_projection": {
1374
- "bits": 8,
1375
- "group_size": 64
1376
- },
1377
- "model.language_model.per_layer_model_projection": {
1378
- "bits": 8,
1379
- "group_size": 64
1380
- },
1381
- "model.vision_tower.patch_embedder.input_proj": {
1382
- "bits": 4,
1383
- "group_size": 64
1384
- },
1385
- "model.vision_tower.encoder.layers.0.self_attn.q_proj.linear": {
1386
- "bits": 4,
1387
- "group_size": 64
1388
- },
1389
- "model.vision_tower.encoder.layers.0.self_attn.k_proj.linear": {
1390
- "bits": 4,
1391
- "group_size": 64
1392
- },
1393
- "model.vision_tower.encoder.layers.0.self_attn.v_proj.linear": {
1394
- "bits": 4,
1395
- "group_size": 64
1396
- },
1397
- "model.vision_tower.encoder.layers.0.self_attn.o_proj.linear": {
1398
- "bits": 4,
1399
- "group_size": 64
1400
- },
1401
- "model.vision_tower.encoder.layers.0.mlp.gate_proj.linear": {
1402
- "bits": 4,
1403
- "group_size": 64
1404
- },
1405
- "model.vision_tower.encoder.layers.0.mlp.up_proj.linear": {
1406
- "bits": 4,
1407
- "group_size": 64
1408
- },
1409
- "model.vision_tower.encoder.layers.0.mlp.down_proj.linear": {
1410
- "bits": 4,
1411
- "group_size": 64
1412
- },
1413
- "model.vision_tower.encoder.layers.1.self_attn.q_proj.linear": {
1414
- "bits": 4,
1415
- "group_size": 64
1416
- },
1417
- "model.vision_tower.encoder.layers.1.self_attn.k_proj.linear": {
1418
- "bits": 4,
1419
- "group_size": 64
1420
- },
1421
- "model.vision_tower.encoder.layers.1.self_attn.v_proj.linear": {
1422
- "bits": 4,
1423
- "group_size": 64
1424
- },
1425
- "model.vision_tower.encoder.layers.1.self_attn.o_proj.linear": {
1426
- "bits": 4,
1427
- "group_size": 64
1428
- },
1429
- "model.vision_tower.encoder.layers.1.mlp.gate_proj.linear": {
1430
- "bits": 4,
1431
- "group_size": 64
1432
- },
1433
- "model.vision_tower.encoder.layers.1.mlp.up_proj.linear": {
1434
- "bits": 4,
1435
- "group_size": 64
1436
- },
1437
- "model.vision_tower.encoder.layers.1.mlp.down_proj.linear": {
1438
- "bits": 4,
1439
- "group_size": 64
1440
- },
1441
- "model.vision_tower.encoder.layers.2.self_attn.q_proj.linear": {
1442
- "bits": 4,
1443
- "group_size": 64
1444
- },
1445
- "model.vision_tower.encoder.layers.2.self_attn.k_proj.linear": {
1446
- "bits": 4,
1447
- "group_size": 64
1448
- },
1449
- "model.vision_tower.encoder.layers.2.self_attn.v_proj.linear": {
1450
- "bits": 4,
1451
- "group_size": 64
1452
- },
1453
- "model.vision_tower.encoder.layers.2.self_attn.o_proj.linear": {
1454
- "bits": 4,
1455
- "group_size": 64
1456
- },
1457
- "model.vision_tower.encoder.layers.2.mlp.gate_proj.linear": {
1458
- "bits": 4,
1459
- "group_size": 64
1460
- },
1461
- "model.vision_tower.encoder.layers.2.mlp.up_proj.linear": {
1462
- "bits": 4,
1463
- "group_size": 64
1464
- },
1465
- "model.vision_tower.encoder.layers.2.mlp.down_proj.linear": {
1466
- "bits": 4,
1467
- "group_size": 64
1468
- },
1469
- "model.vision_tower.encoder.layers.3.self_attn.q_proj.linear": {
1470
- "bits": 4,
1471
- "group_size": 64
1472
- },
1473
- "model.vision_tower.encoder.layers.3.self_attn.k_proj.linear": {
1474
- "bits": 4,
1475
- "group_size": 64
1476
- },
1477
- "model.vision_tower.encoder.layers.3.self_attn.v_proj.linear": {
1478
- "bits": 4,
1479
- "group_size": 64
1480
- },
1481
- "model.vision_tower.encoder.layers.3.self_attn.o_proj.linear": {
1482
- "bits": 4,
1483
- "group_size": 64
1484
- },
1485
- "model.vision_tower.encoder.layers.3.mlp.gate_proj.linear": {
1486
- "bits": 4,
1487
- "group_size": 64
1488
- },
1489
- "model.vision_tower.encoder.layers.3.mlp.up_proj.linear": {
1490
- "bits": 4,
1491
- "group_size": 64
1492
- },
1493
- "model.vision_tower.encoder.layers.3.mlp.down_proj.linear": {
1494
- "bits": 4,
1495
- "group_size": 64
1496
- },
1497
- "model.vision_tower.encoder.layers.4.self_attn.q_proj.linear": {
1498
- "bits": 4,
1499
- "group_size": 64
1500
- },
1501
- "model.vision_tower.encoder.layers.4.self_attn.k_proj.linear": {
1502
- "bits": 4,
1503
- "group_size": 64
1504
- },
1505
- "model.vision_tower.encoder.layers.4.self_attn.v_proj.linear": {
1506
- "bits": 4,
1507
- "group_size": 64
1508
- },
1509
- "model.vision_tower.encoder.layers.4.self_attn.o_proj.linear": {
1510
- "bits": 4,
1511
- "group_size": 64
1512
- },
1513
- "model.vision_tower.encoder.layers.4.mlp.gate_proj.linear": {
1514
- "bits": 4,
1515
- "group_size": 64
1516
- },
1517
- "model.vision_tower.encoder.layers.4.mlp.up_proj.linear": {
1518
- "bits": 4,
1519
- "group_size": 64
1520
- },
1521
- "model.vision_tower.encoder.layers.4.mlp.down_proj.linear": {
1522
- "bits": 4,
1523
- "group_size": 64
1524
- },
1525
- "model.vision_tower.encoder.layers.5.self_attn.q_proj.linear": {
1526
- "bits": 4,
1527
- "group_size": 64
1528
- },
1529
- "model.vision_tower.encoder.layers.5.self_attn.k_proj.linear": {
1530
- "bits": 4,
1531
- "group_size": 64
1532
- },
1533
- "model.vision_tower.encoder.layers.5.self_attn.v_proj.linear": {
1534
- "bits": 4,
1535
- "group_size": 64
1536
- },
1537
- "model.vision_tower.encoder.layers.5.self_attn.o_proj.linear": {
1538
- "bits": 4,
1539
- "group_size": 64
1540
- },
1541
- "model.vision_tower.encoder.layers.5.mlp.gate_proj.linear": {
1542
- "bits": 4,
1543
- "group_size": 64
1544
- },
1545
- "model.vision_tower.encoder.layers.5.mlp.up_proj.linear": {
1546
- "bits": 4,
1547
- "group_size": 64
1548
- },
1549
- "model.vision_tower.encoder.layers.5.mlp.down_proj.linear": {
1550
- "bits": 4,
1551
- "group_size": 64
1552
- },
1553
- "model.vision_tower.encoder.layers.6.self_attn.q_proj.linear": {
1554
- "bits": 4,
1555
- "group_size": 64
1556
- },
1557
- "model.vision_tower.encoder.layers.6.self_attn.k_proj.linear": {
1558
- "bits": 4,
1559
- "group_size": 64
1560
- },
1561
- "model.vision_tower.encoder.layers.6.self_attn.v_proj.linear": {
1562
- "bits": 4,
1563
- "group_size": 64
1564
- },
1565
- "model.vision_tower.encoder.layers.6.self_attn.o_proj.linear": {
1566
- "bits": 4,
1567
- "group_size": 64
1568
- },
1569
- "model.vision_tower.encoder.layers.6.mlp.gate_proj.linear": {
1570
- "bits": 4,
1571
- "group_size": 64
1572
- },
1573
- "model.vision_tower.encoder.layers.6.mlp.up_proj.linear": {
1574
- "bits": 4,
1575
- "group_size": 64
1576
- },
1577
- "model.vision_tower.encoder.layers.6.mlp.down_proj.linear": {
1578
- "bits": 4,
1579
- "group_size": 64
1580
- },
1581
- "model.vision_tower.encoder.layers.7.self_attn.q_proj.linear": {
1582
- "bits": 4,
1583
- "group_size": 64
1584
- },
1585
- "model.vision_tower.encoder.layers.7.self_attn.k_proj.linear": {
1586
- "bits": 4,
1587
- "group_size": 64
1588
- },
1589
- "model.vision_tower.encoder.layers.7.self_attn.v_proj.linear": {
1590
- "bits": 4,
1591
- "group_size": 64
1592
- },
1593
- "model.vision_tower.encoder.layers.7.self_attn.o_proj.linear": {
1594
- "bits": 4,
1595
- "group_size": 64
1596
- },
1597
- "model.vision_tower.encoder.layers.7.mlp.gate_proj.linear": {
1598
- "bits": 4,
1599
- "group_size": 64
1600
- },
1601
- "model.vision_tower.encoder.layers.7.mlp.up_proj.linear": {
1602
- "bits": 4,
1603
- "group_size": 64
1604
- },
1605
- "model.vision_tower.encoder.layers.7.mlp.down_proj.linear": {
1606
- "bits": 4,
1607
- "group_size": 64
1608
- },
1609
- "model.vision_tower.encoder.layers.8.self_attn.q_proj.linear": {
1610
- "bits": 4,
1611
- "group_size": 64
1612
- },
1613
- "model.vision_tower.encoder.layers.8.self_attn.k_proj.linear": {
1614
- "bits": 4,
1615
- "group_size": 64
1616
- },
1617
- "model.vision_tower.encoder.layers.8.self_attn.v_proj.linear": {
1618
- "bits": 4,
1619
- "group_size": 64
1620
- },
1621
- "model.vision_tower.encoder.layers.8.self_attn.o_proj.linear": {
1622
- "bits": 4,
1623
- "group_size": 64
1624
- },
1625
- "model.vision_tower.encoder.layers.8.mlp.gate_proj.linear": {
1626
- "bits": 4,
1627
- "group_size": 64
1628
- },
1629
- "model.vision_tower.encoder.layers.8.mlp.up_proj.linear": {
1630
- "bits": 4,
1631
- "group_size": 64
1632
- },
1633
- "model.vision_tower.encoder.layers.8.mlp.down_proj.linear": {
1634
- "bits": 4,
1635
- "group_size": 64
1636
- },
1637
- "model.vision_tower.encoder.layers.9.self_attn.q_proj.linear": {
1638
- "bits": 4,
1639
- "group_size": 64
1640
- },
1641
- "model.vision_tower.encoder.layers.9.self_attn.k_proj.linear": {
1642
- "bits": 4,
1643
- "group_size": 64
1644
- },
1645
- "model.vision_tower.encoder.layers.9.self_attn.v_proj.linear": {
1646
- "bits": 4,
1647
- "group_size": 64
1648
- },
1649
- "model.vision_tower.encoder.layers.9.self_attn.o_proj.linear": {
1650
- "bits": 4,
1651
- "group_size": 64
1652
- },
1653
- "model.vision_tower.encoder.layers.9.mlp.gate_proj.linear": {
1654
- "bits": 4,
1655
- "group_size": 64
1656
- },
1657
- "model.vision_tower.encoder.layers.9.mlp.up_proj.linear": {
1658
- "bits": 4,
1659
- "group_size": 64
1660
- },
1661
- "model.vision_tower.encoder.layers.9.mlp.down_proj.linear": {
1662
- "bits": 4,
1663
- "group_size": 64
1664
- },
1665
- "model.vision_tower.encoder.layers.10.self_attn.q_proj.linear": {
1666
- "bits": 4,
1667
- "group_size": 64
1668
- },
1669
- "model.vision_tower.encoder.layers.10.self_attn.k_proj.linear": {
1670
- "bits": 4,
1671
- "group_size": 64
1672
- },
1673
- "model.vision_tower.encoder.layers.10.self_attn.v_proj.linear": {
1674
- "bits": 4,
1675
- "group_size": 64
1676
- },
1677
- "model.vision_tower.encoder.layers.10.self_attn.o_proj.linear": {
1678
- "bits": 4,
1679
- "group_size": 64
1680
- },
1681
- "model.vision_tower.encoder.layers.10.mlp.gate_proj.linear": {
1682
- "bits": 4,
1683
- "group_size": 64
1684
- },
1685
- "model.vision_tower.encoder.layers.10.mlp.up_proj.linear": {
1686
- "bits": 4,
1687
- "group_size": 64
1688
- },
1689
- "model.vision_tower.encoder.layers.10.mlp.down_proj.linear": {
1690
- "bits": 4,
1691
- "group_size": 64
1692
- },
1693
- "model.vision_tower.encoder.layers.11.self_attn.q_proj.linear": {
1694
- "bits": 4,
1695
- "group_size": 64
1696
- },
1697
- "model.vision_tower.encoder.layers.11.self_attn.k_proj.linear": {
1698
- "bits": 4,
1699
- "group_size": 64
1700
- },
1701
- "model.vision_tower.encoder.layers.11.self_attn.v_proj.linear": {
1702
- "bits": 4,
1703
- "group_size": 64
1704
- },
1705
- "model.vision_tower.encoder.layers.11.self_attn.o_proj.linear": {
1706
- "bits": 4,
1707
- "group_size": 64
1708
- },
1709
- "model.vision_tower.encoder.layers.11.mlp.gate_proj.linear": {
1710
- "bits": 4,
1711
- "group_size": 64
1712
- },
1713
- "model.vision_tower.encoder.layers.11.mlp.up_proj.linear": {
1714
- "bits": 4,
1715
- "group_size": 64
1716
- },
1717
- "model.vision_tower.encoder.layers.11.mlp.down_proj.linear": {
1718
- "bits": 4,
1719
- "group_size": 64
1720
- },
1721
- "model.vision_tower.encoder.layers.12.self_attn.q_proj.linear": {
1722
- "bits": 4,
1723
- "group_size": 64
1724
- },
1725
- "model.vision_tower.encoder.layers.12.self_attn.k_proj.linear": {
1726
- "bits": 4,
1727
- "group_size": 64
1728
- },
1729
- "model.vision_tower.encoder.layers.12.self_attn.v_proj.linear": {
1730
- "bits": 4,
1731
- "group_size": 64
1732
- },
1733
- "model.vision_tower.encoder.layers.12.self_attn.o_proj.linear": {
1734
- "bits": 4,
1735
- "group_size": 64
1736
- },
1737
- "model.vision_tower.encoder.layers.12.mlp.gate_proj.linear": {
1738
- "bits": 4,
1739
- "group_size": 64
1740
- },
1741
- "model.vision_tower.encoder.layers.12.mlp.up_proj.linear": {
1742
- "bits": 4,
1743
- "group_size": 64
1744
- },
1745
- "model.vision_tower.encoder.layers.12.mlp.down_proj.linear": {
1746
- "bits": 4,
1747
- "group_size": 64
1748
- },
1749
- "model.vision_tower.encoder.layers.13.self_attn.q_proj.linear": {
1750
- "bits": 4,
1751
- "group_size": 64
1752
- },
1753
- "model.vision_tower.encoder.layers.13.self_attn.k_proj.linear": {
1754
- "bits": 4,
1755
- "group_size": 64
1756
- },
1757
- "model.vision_tower.encoder.layers.13.self_attn.v_proj.linear": {
1758
- "bits": 4,
1759
- "group_size": 64
1760
- },
1761
- "model.vision_tower.encoder.layers.13.self_attn.o_proj.linear": {
1762
- "bits": 4,
1763
- "group_size": 64
1764
- },
1765
- "model.vision_tower.encoder.layers.13.mlp.gate_proj.linear": {
1766
- "bits": 4,
1767
- "group_size": 64
1768
- },
1769
- "model.vision_tower.encoder.layers.13.mlp.up_proj.linear": {
1770
- "bits": 4,
1771
- "group_size": 64
1772
- },
1773
- "model.vision_tower.encoder.layers.13.mlp.down_proj.linear": {
1774
- "bits": 4,
1775
- "group_size": 64
1776
- },
1777
- "model.vision_tower.encoder.layers.14.self_attn.q_proj.linear": {
1778
- "bits": 4,
1779
- "group_size": 64
1780
- },
1781
- "model.vision_tower.encoder.layers.14.self_attn.k_proj.linear": {
1782
- "bits": 4,
1783
- "group_size": 64
1784
- },
1785
- "model.vision_tower.encoder.layers.14.self_attn.v_proj.linear": {
1786
- "bits": 4,
1787
- "group_size": 64
1788
- },
1789
- "model.vision_tower.encoder.layers.14.self_attn.o_proj.linear": {
1790
- "bits": 4,
1791
- "group_size": 64
1792
- },
1793
- "model.vision_tower.encoder.layers.14.mlp.gate_proj.linear": {
1794
- "bits": 4,
1795
- "group_size": 64
1796
- },
1797
- "model.vision_tower.encoder.layers.14.mlp.up_proj.linear": {
1798
- "bits": 4,
1799
- "group_size": 64
1800
- },
1801
- "model.vision_tower.encoder.layers.14.mlp.down_proj.linear": {
1802
- "bits": 4,
1803
- "group_size": 64
1804
- },
1805
- "model.vision_tower.encoder.layers.15.self_attn.q_proj.linear": {
1806
- "bits": 4,
1807
- "group_size": 64
1808
- },
1809
- "model.vision_tower.encoder.layers.15.self_attn.k_proj.linear": {
1810
- "bits": 4,
1811
- "group_size": 64
1812
- },
1813
- "model.vision_tower.encoder.layers.15.self_attn.v_proj.linear": {
1814
- "bits": 4,
1815
- "group_size": 64
1816
- },
1817
- "model.vision_tower.encoder.layers.15.self_attn.o_proj.linear": {
1818
- "bits": 4,
1819
- "group_size": 64
1820
- },
1821
- "model.vision_tower.encoder.layers.15.mlp.gate_proj.linear": {
1822
- "bits": 4,
1823
- "group_size": 64
1824
- },
1825
- "model.vision_tower.encoder.layers.15.mlp.up_proj.linear": {
1826
- "bits": 4,
1827
- "group_size": 64
1828
- },
1829
- "model.vision_tower.encoder.layers.15.mlp.down_proj.linear": {
1830
- "bits": 4,
1831
- "group_size": 64
1832
- },
1833
- "model.embed_vision.embedding_projection": {
1834
- "bits": 4,
1835
- "group_size": 64
1836
- },
1837
- "model.audio_tower.subsample_conv_projection.input_proj_linear": {
1838
- "bits": 4,
1839
- "group_size": 64
1840
- },
1841
- "model.audio_tower.layers.0.feed_forward1.ffw_layer_1.linear": {
1842
- "bits": 4,
1843
- "group_size": 64
1844
- },
1845
- "model.audio_tower.layers.0.feed_forward1.ffw_layer_2.linear": {
1846
- "bits": 4,
1847
- "group_size": 64
1848
- },
1849
- "model.audio_tower.layers.0.feed_forward2.ffw_layer_1.linear": {
1850
- "bits": 4,
1851
- "group_size": 64
1852
- },
1853
- "model.audio_tower.layers.0.feed_forward2.ffw_layer_2.linear": {
1854
- "bits": 4,
1855
- "group_size": 64
1856
- },
1857
- "model.audio_tower.layers.0.self_attn.q_proj.linear": {
1858
- "bits": 4,
1859
- "group_size": 64
1860
- },
1861
- "model.audio_tower.layers.0.self_attn.k_proj.linear": {
1862
- "bits": 4,
1863
- "group_size": 64
1864
- },
1865
- "model.audio_tower.layers.0.self_attn.v_proj.linear": {
1866
- "bits": 4,
1867
- "group_size": 64
1868
- },
1869
- "model.audio_tower.layers.0.self_attn.post.linear": {
1870
- "bits": 4,
1871
- "group_size": 64
1872
- },
1873
- "model.audio_tower.layers.0.self_attn.relative_k_proj": {
1874
- "bits": 4,
1875
- "group_size": 64
1876
- },
1877
- "model.audio_tower.layers.0.lconv1d.linear_start.linear": {
1878
- "bits": 4,
1879
- "group_size": 64
1880
- },
1881
- "model.audio_tower.layers.0.lconv1d.linear_end.linear": {
1882
- "bits": 4,
1883
- "group_size": 64
1884
- },
1885
- "model.audio_tower.layers.1.feed_forward1.ffw_layer_1.linear": {
1886
- "bits": 4,
1887
- "group_size": 64
1888
- },
1889
- "model.audio_tower.layers.1.feed_forward1.ffw_layer_2.linear": {
1890
- "bits": 4,
1891
- "group_size": 64
1892
- },
1893
- "model.audio_tower.layers.1.feed_forward2.ffw_layer_1.linear": {
1894
- "bits": 4,
1895
- "group_size": 64
1896
- },
1897
- "model.audio_tower.layers.1.feed_forward2.ffw_layer_2.linear": {
1898
- "bits": 4,
1899
- "group_size": 64
1900
- },
1901
- "model.audio_tower.layers.1.self_attn.q_proj.linear": {
1902
- "bits": 4,
1903
- "group_size": 64
1904
- },
1905
- "model.audio_tower.layers.1.self_attn.k_proj.linear": {
1906
- "bits": 4,
1907
- "group_size": 64
1908
- },
1909
- "model.audio_tower.layers.1.self_attn.v_proj.linear": {
1910
- "bits": 4,
1911
- "group_size": 64
1912
- },
1913
- "model.audio_tower.layers.1.self_attn.post.linear": {
1914
- "bits": 4,
1915
- "group_size": 64
1916
- },
1917
- "model.audio_tower.layers.1.self_attn.relative_k_proj": {
1918
- "bits": 4,
1919
- "group_size": 64
1920
- },
1921
- "model.audio_tower.layers.1.lconv1d.linear_start.linear": {
1922
- "bits": 4,
1923
- "group_size": 64
1924
- },
1925
- "model.audio_tower.layers.1.lconv1d.linear_end.linear": {
1926
- "bits": 4,
1927
- "group_size": 64
1928
- },
1929
- "model.audio_tower.layers.2.feed_forward1.ffw_layer_1.linear": {
1930
- "bits": 4,
1931
- "group_size": 64
1932
- },
1933
- "model.audio_tower.layers.2.feed_forward1.ffw_layer_2.linear": {
1934
  "bits": 4,
1935
  "group_size": 64
1936
  },
1937
- "model.audio_tower.layers.2.feed_forward2.ffw_layer_1.linear": {
1938
- "bits": 4,
1939
- "group_size": 64
1940
- },
1941
- "model.audio_tower.layers.2.feed_forward2.ffw_layer_2.linear": {
1942
- "bits": 4,
1943
- "group_size": 64
1944
- },
1945
- "model.audio_tower.layers.2.self_attn.q_proj.linear": {
1946
- "bits": 4,
1947
- "group_size": 64
1948
- },
1949
- "model.audio_tower.layers.2.self_attn.k_proj.linear": {
1950
- "bits": 4,
1951
- "group_size": 64
1952
- },
1953
- "model.audio_tower.layers.2.self_attn.v_proj.linear": {
1954
- "bits": 4,
1955
- "group_size": 64
1956
- },
1957
- "model.audio_tower.layers.2.self_attn.post.linear": {
1958
- "bits": 4,
1959
- "group_size": 64
1960
- },
1961
- "model.audio_tower.layers.2.self_attn.relative_k_proj": {
1962
- "bits": 4,
1963
- "group_size": 64
1964
- },
1965
- "model.audio_tower.layers.2.lconv1d.linear_start.linear": {
1966
- "bits": 4,
1967
- "group_size": 64
1968
- },
1969
- "model.audio_tower.layers.2.lconv1d.linear_end.linear": {
1970
- "bits": 4,
1971
- "group_size": 64
1972
- },
1973
- "model.audio_tower.layers.3.feed_forward1.ffw_layer_1.linear": {
1974
- "bits": 4,
1975
- "group_size": 64
1976
- },
1977
- "model.audio_tower.layers.3.feed_forward1.ffw_layer_2.linear": {
1978
- "bits": 4,
1979
- "group_size": 64
1980
- },
1981
- "model.audio_tower.layers.3.feed_forward2.ffw_layer_1.linear": {
1982
- "bits": 4,
1983
- "group_size": 64
1984
- },
1985
- "model.audio_tower.layers.3.feed_forward2.ffw_layer_2.linear": {
1986
- "bits": 4,
1987
- "group_size": 64
1988
- },
1989
- "model.audio_tower.layers.3.self_attn.q_proj.linear": {
1990
- "bits": 4,
1991
- "group_size": 64
1992
- },
1993
- "model.audio_tower.layers.3.self_attn.k_proj.linear": {
1994
- "bits": 4,
1995
- "group_size": 64
1996
- },
1997
- "model.audio_tower.layers.3.self_attn.v_proj.linear": {
1998
- "bits": 4,
1999
- "group_size": 64
2000
- },
2001
- "model.audio_tower.layers.3.self_attn.post.linear": {
2002
- "bits": 4,
2003
- "group_size": 64
2004
- },
2005
- "model.audio_tower.layers.3.self_attn.relative_k_proj": {
2006
- "bits": 4,
2007
- "group_size": 64
2008
- },
2009
- "model.audio_tower.layers.3.lconv1d.linear_start.linear": {
2010
- "bits": 4,
2011
- "group_size": 64
2012
- },
2013
- "model.audio_tower.layers.3.lconv1d.linear_end.linear": {
2014
- "bits": 4,
2015
- "group_size": 64
2016
- },
2017
- "model.audio_tower.layers.4.feed_forward1.ffw_layer_1.linear": {
2018
- "bits": 4,
2019
- "group_size": 64
2020
- },
2021
- "model.audio_tower.layers.4.feed_forward1.ffw_layer_2.linear": {
2022
- "bits": 4,
2023
- "group_size": 64
2024
- },
2025
- "model.audio_tower.layers.4.feed_forward2.ffw_layer_1.linear": {
2026
- "bits": 4,
2027
- "group_size": 64
2028
- },
2029
- "model.audio_tower.layers.4.feed_forward2.ffw_layer_2.linear": {
2030
- "bits": 4,
2031
- "group_size": 64
2032
- },
2033
- "model.audio_tower.layers.4.self_attn.q_proj.linear": {
2034
- "bits": 4,
2035
- "group_size": 64
2036
- },
2037
- "model.audio_tower.layers.4.self_attn.k_proj.linear": {
2038
- "bits": 4,
2039
- "group_size": 64
2040
- },
2041
- "model.audio_tower.layers.4.self_attn.v_proj.linear": {
2042
- "bits": 4,
2043
- "group_size": 64
2044
- },
2045
- "model.audio_tower.layers.4.self_attn.post.linear": {
2046
- "bits": 4,
2047
- "group_size": 64
2048
- },
2049
- "model.audio_tower.layers.4.self_attn.relative_k_proj": {
2050
- "bits": 4,
2051
- "group_size": 64
2052
- },
2053
- "model.audio_tower.layers.4.lconv1d.linear_start.linear": {
2054
- "bits": 4,
2055
- "group_size": 64
2056
- },
2057
- "model.audio_tower.layers.4.lconv1d.linear_end.linear": {
2058
- "bits": 4,
2059
- "group_size": 64
2060
- },
2061
- "model.audio_tower.layers.5.feed_forward1.ffw_layer_1.linear": {
2062
- "bits": 4,
2063
- "group_size": 64
2064
- },
2065
- "model.audio_tower.layers.5.feed_forward1.ffw_layer_2.linear": {
2066
- "bits": 4,
2067
- "group_size": 64
2068
- },
2069
- "model.audio_tower.layers.5.feed_forward2.ffw_layer_1.linear": {
2070
- "bits": 4,
2071
- "group_size": 64
2072
- },
2073
- "model.audio_tower.layers.5.feed_forward2.ffw_layer_2.linear": {
2074
- "bits": 4,
2075
- "group_size": 64
2076
- },
2077
- "model.audio_tower.layers.5.self_attn.q_proj.linear": {
2078
- "bits": 4,
2079
- "group_size": 64
2080
- },
2081
- "model.audio_tower.layers.5.self_attn.k_proj.linear": {
2082
- "bits": 4,
2083
- "group_size": 64
2084
- },
2085
- "model.audio_tower.layers.5.self_attn.v_proj.linear": {
2086
- "bits": 4,
2087
- "group_size": 64
2088
- },
2089
- "model.audio_tower.layers.5.self_attn.post.linear": {
2090
- "bits": 4,
2091
- "group_size": 64
2092
- },
2093
- "model.audio_tower.layers.5.self_attn.relative_k_proj": {
2094
- "bits": 4,
2095
- "group_size": 64
2096
- },
2097
- "model.audio_tower.layers.5.lconv1d.linear_start.linear": {
2098
- "bits": 4,
2099
- "group_size": 64
2100
- },
2101
- "model.audio_tower.layers.5.lconv1d.linear_end.linear": {
2102
- "bits": 4,
2103
- "group_size": 64
2104
- },
2105
- "model.audio_tower.layers.6.feed_forward1.ffw_layer_1.linear": {
2106
- "bits": 4,
2107
- "group_size": 64
2108
- },
2109
- "model.audio_tower.layers.6.feed_forward1.ffw_layer_2.linear": {
2110
- "bits": 4,
2111
- "group_size": 64
2112
- },
2113
- "model.audio_tower.layers.6.feed_forward2.ffw_layer_1.linear": {
2114
- "bits": 4,
2115
- "group_size": 64
2116
- },
2117
- "model.audio_tower.layers.6.feed_forward2.ffw_layer_2.linear": {
2118
- "bits": 4,
2119
- "group_size": 64
2120
- },
2121
- "model.audio_tower.layers.6.self_attn.q_proj.linear": {
2122
- "bits": 4,
2123
- "group_size": 64
2124
- },
2125
- "model.audio_tower.layers.6.self_attn.k_proj.linear": {
2126
- "bits": 4,
2127
- "group_size": 64
2128
- },
2129
- "model.audio_tower.layers.6.self_attn.v_proj.linear": {
2130
- "bits": 4,
2131
- "group_size": 64
2132
- },
2133
- "model.audio_tower.layers.6.self_attn.post.linear": {
2134
- "bits": 4,
2135
- "group_size": 64
2136
- },
2137
- "model.audio_tower.layers.6.self_attn.relative_k_proj": {
2138
- "bits": 4,
2139
- "group_size": 64
2140
- },
2141
- "model.audio_tower.layers.6.lconv1d.linear_start.linear": {
2142
- "bits": 4,
2143
- "group_size": 64
2144
- },
2145
- "model.audio_tower.layers.6.lconv1d.linear_end.linear": {
2146
- "bits": 4,
2147
- "group_size": 64
2148
- },
2149
- "model.audio_tower.layers.7.feed_forward1.ffw_layer_1.linear": {
2150
- "bits": 4,
2151
- "group_size": 64
2152
- },
2153
- "model.audio_tower.layers.7.feed_forward1.ffw_layer_2.linear": {
2154
- "bits": 4,
2155
- "group_size": 64
2156
- },
2157
- "model.audio_tower.layers.7.feed_forward2.ffw_layer_1.linear": {
2158
- "bits": 4,
2159
- "group_size": 64
2160
- },
2161
- "model.audio_tower.layers.7.feed_forward2.ffw_layer_2.linear": {
2162
- "bits": 4,
2163
- "group_size": 64
2164
- },
2165
- "model.audio_tower.layers.7.self_attn.q_proj.linear": {
2166
- "bits": 4,
2167
- "group_size": 64
2168
- },
2169
- "model.audio_tower.layers.7.self_attn.k_proj.linear": {
2170
- "bits": 4,
2171
- "group_size": 64
2172
- },
2173
- "model.audio_tower.layers.7.self_attn.v_proj.linear": {
2174
- "bits": 4,
2175
- "group_size": 64
2176
- },
2177
- "model.audio_tower.layers.7.self_attn.post.linear": {
2178
- "bits": 4,
2179
- "group_size": 64
2180
- },
2181
- "model.audio_tower.layers.7.self_attn.relative_k_proj": {
2182
- "bits": 4,
2183
- "group_size": 64
2184
- },
2185
- "model.audio_tower.layers.7.lconv1d.linear_start.linear": {
2186
- "bits": 4,
2187
- "group_size": 64
2188
- },
2189
- "model.audio_tower.layers.7.lconv1d.linear_end.linear": {
2190
- "bits": 4,
2191
- "group_size": 64
2192
- },
2193
- "model.audio_tower.layers.8.feed_forward1.ffw_layer_1.linear": {
2194
- "bits": 4,
2195
- "group_size": 64
2196
- },
2197
- "model.audio_tower.layers.8.feed_forward1.ffw_layer_2.linear": {
2198
- "bits": 4,
2199
- "group_size": 64
2200
- },
2201
- "model.audio_tower.layers.8.feed_forward2.ffw_layer_1.linear": {
2202
- "bits": 4,
2203
  "group_size": 64
2204
  },
2205
- "model.audio_tower.layers.8.feed_forward2.ffw_layer_2.linear": {
2206
- "bits": 4,
2207
  "group_size": 64
2208
  },
2209
- "model.audio_tower.layers.8.self_attn.q_proj.linear": {
2210
  "bits": 4,
2211
  "group_size": 64
2212
  },
2213
- "model.audio_tower.layers.8.self_attn.k_proj.linear": {
2214
- "bits": 4,
2215
  "group_size": 64
2216
  },
2217
- "model.audio_tower.layers.8.self_attn.v_proj.linear": {
2218
  "bits": 4,
2219
  "group_size": 64
2220
  },
2221
- "model.audio_tower.layers.8.self_attn.post.linear": {
2222
  "bits": 4,
2223
  "group_size": 64
2224
  },
2225
- "model.audio_tower.layers.8.self_attn.relative_k_proj": {
2226
- "bits": 4,
2227
  "group_size": 64
2228
  },
2229
- "model.audio_tower.layers.8.lconv1d.linear_start.linear": {
2230
- "bits": 4,
2231
  "group_size": 64
2232
  },
2233
- "model.audio_tower.layers.8.lconv1d.linear_end.linear": {
2234
- "bits": 4,
2235
  "group_size": 64
2236
  },
2237
- "model.audio_tower.layers.9.feed_forward1.ffw_layer_1.linear": {
2238
- "bits": 4,
2239
  "group_size": 64
2240
  },
2241
- "model.audio_tower.layers.9.feed_forward1.ffw_layer_2.linear": {
2242
- "bits": 4,
2243
  "group_size": 64
2244
  },
2245
- "model.audio_tower.layers.9.feed_forward2.ffw_layer_1.linear": {
2246
  "bits": 4,
2247
  "group_size": 64
2248
  },
2249
- "model.audio_tower.layers.9.feed_forward2.ffw_layer_2.linear": {
2250
- "bits": 4,
2251
  "group_size": 64
2252
  },
2253
- "model.audio_tower.layers.9.self_attn.q_proj.linear": {
2254
  "bits": 4,
2255
  "group_size": 64
2256
  },
2257
- "model.audio_tower.layers.9.self_attn.k_proj.linear": {
2258
- "bits": 4,
2259
  "group_size": 64
2260
  },
2261
- "model.audio_tower.layers.9.self_attn.v_proj.linear": {
2262
- "bits": 4,
2263
  "group_size": 64
2264
  },
2265
- "model.audio_tower.layers.9.self_attn.post.linear": {
2266
  "bits": 4,
2267
  "group_size": 64
2268
  },
2269
- "model.audio_tower.layers.9.self_attn.relative_k_proj": {
2270
  "bits": 4,
2271
  "group_size": 64
2272
  },
2273
- "model.audio_tower.layers.9.lconv1d.linear_start.linear": {
2274
  "bits": 4,
2275
  "group_size": 64
2276
  },
2277
- "model.audio_tower.layers.9.lconv1d.linear_end.linear": {
2278
- "bits": 4,
2279
  "group_size": 64
2280
  },
2281
- "model.audio_tower.layers.10.feed_forward1.ffw_layer_1.linear": {
2282
  "bits": 4,
2283
  "group_size": 64
2284
  },
2285
- "model.audio_tower.layers.10.feed_forward1.ffw_layer_2.linear": {
2286
  "bits": 4,
2287
  "group_size": 64
2288
  },
2289
- "model.audio_tower.layers.10.feed_forward2.ffw_layer_1.linear": {
2290
- "bits": 4,
2291
  "group_size": 64
2292
  },
2293
- "model.audio_tower.layers.10.feed_forward2.ffw_layer_2.linear": {
2294
  "bits": 4,
2295
  "group_size": 64
2296
  },
2297
- "model.audio_tower.layers.10.self_attn.q_proj.linear": {
2298
- "bits": 4,
2299
  "group_size": 64
2300
  },
2301
- "model.audio_tower.layers.10.self_attn.k_proj.linear": {
2302
  "bits": 4,
2303
  "group_size": 64
2304
  },
2305
- "model.audio_tower.layers.10.self_attn.v_proj.linear": {
2306
- "bits": 4,
2307
  "group_size": 64
2308
  },
2309
- "model.audio_tower.layers.10.self_attn.post.linear": {
2310
  "bits": 4,
2311
  "group_size": 64
2312
  },
2313
- "model.audio_tower.layers.10.self_attn.relative_k_proj": {
2314
- "bits": 4,
2315
  "group_size": 64
2316
  },
2317
- "model.audio_tower.layers.10.lconv1d.linear_start.linear": {
2318
  "bits": 4,
2319
  "group_size": 64
2320
  },
2321
- "model.audio_tower.layers.10.lconv1d.linear_end.linear": {
2322
- "bits": 4,
2323
  "group_size": 64
2324
  },
2325
- "model.audio_tower.layers.11.feed_forward1.ffw_layer_1.linear": {
2326
  "bits": 4,
2327
  "group_size": 64
2328
  },
2329
- "model.audio_tower.layers.11.feed_forward1.ffw_layer_2.linear": {
2330
- "bits": 4,
2331
  "group_size": 64
2332
  },
2333
- "model.audio_tower.layers.11.feed_forward2.ffw_layer_1.linear": {
2334
- "bits": 4,
2335
  "group_size": 64
2336
  },
2337
- "model.audio_tower.layers.11.feed_forward2.ffw_layer_2.linear": {
2338
- "bits": 4,
2339
  "group_size": 64
2340
  },
2341
- "model.audio_tower.layers.11.self_attn.q_proj.linear": {
2342
  "bits": 4,
2343
  "group_size": 64
2344
  },
2345
- "model.audio_tower.layers.11.self_attn.k_proj.linear": {
2346
- "bits": 4,
2347
  "group_size": 64
2348
  },
2349
- "model.audio_tower.layers.11.self_attn.v_proj.linear": {
2350
- "bits": 4,
2351
  "group_size": 64
2352
  },
2353
- "model.audio_tower.layers.11.self_attn.post.linear": {
2354
  "bits": 4,
2355
  "group_size": 64
2356
  },
2357
- "model.audio_tower.layers.11.self_attn.relative_k_proj": {
2358
- "bits": 4,
2359
  "group_size": 64
2360
  },
2361
- "model.audio_tower.layers.11.lconv1d.linear_start.linear": {
2362
- "bits": 4,
2363
  "group_size": 64
2364
  },
2365
- "model.audio_tower.layers.11.lconv1d.linear_end.linear": {
2366
- "bits": 4,
2367
  "group_size": 64
2368
  },
2369
- "model.audio_tower.output_proj": {
2370
- "bits": 4,
2371
  "group_size": 64
2372
  },
2373
- "model.embed_audio.embedding_projection": {
2374
- "bits": 4,
2375
  "group_size": 64
2376
  },
2377
- "lm_head": {
2378
- "bits": 4,
2379
  "group_size": 64
2380
  }
2381
  },
2382
  "post_processing": [
2383
  {
2384
- "op": "strip_vision",
2385
  "architectures": {
2386
  "from": [
2387
  "Gemma4ForConditionalGeneration"
@@ -2390,6 +1536,7 @@
2390
  "Gemma4ForCausalLM"
2391
  ]
2392
  },
 
2393
  "flattened_text_config": false,
2394
  "dropped_keys": [
2395
  "audio_config",
@@ -2402,13 +1549,8 @@
2402
  "image_token_id",
2403
  "video_token_id",
2404
  "vision_soft_tokens_per_image"
2405
- ]
2406
- },
2407
- {
2408
- "op": "strip_vision",
2409
- "architectures": null,
2410
- "flattened_text_config": true,
2411
- "dropped_keys": []
2412
  }
2413
  ]
2414
  }
 
1
  {
2
  "method": "optiq_mixed_precision",
3
+ "base_model": "google/gemma-4-e4b-it",
4
+ "reference": "bf16",
5
+ "target_bpw": 5.0,
6
+ "achieved_bpw": 5.110966482264888,
7
+ "n_high_bits": 155,
8
+ "n_low_bits": 224,
9
  "threshold": 0.0,
10
  "per_layer": {
11
+ "language_model.model.per_layer_model_projection": {
12
+ "bits": 4,
13
  "group_size": 64
14
  },
15
+ "language_model.model.layers.41.per_layer_projection": {
16
  "bits": 8,
17
  "group_size": 64
18
  },
19
+ "language_model.model.layers.41.per_layer_input_gate": {
20
  "bits": 8,
21
  "group_size": 64
22
  },
23
+ "language_model.model.layers.41.mlp.up_proj": {
24
  "bits": 8,
25
  "group_size": 64
26
  },
27
+ "language_model.model.layers.41.mlp.down_proj": {
28
  "bits": 8,
29
  "group_size": 64
30
  },
31
+ "language_model.model.layers.41.mlp.gate_proj": {
32
  "bits": 8,
33
  "group_size": 64
34
  },
35
+ "language_model.model.layers.41.self_attn.o_proj": {
36
  "bits": 8,
37
  "group_size": 64
38
  },
39
+ "language_model.model.layers.41.self_attn.v_proj": {
40
+ "bits": 4,
41
  "group_size": 64
42
  },
43
+ "language_model.model.layers.41.self_attn.k_proj": {
44
+ "bits": 4,
45
  "group_size": 64
46
  },
47
+ "language_model.model.layers.41.self_attn.q_proj": {
48
  "bits": 8,
49
  "group_size": 64
50
  },
51
+ "language_model.model.layers.40.per_layer_projection": {
52
  "bits": 8,
53
  "group_size": 64
54
  },
55
+ "language_model.model.layers.40.per_layer_input_gate": {
56
+ "bits": 4,
57
  "group_size": 64
58
  },
59
+ "language_model.model.layers.40.mlp.up_proj": {
60
+ "bits": 4,
61
+ "group_size": 64
62
+ },
63
+ "language_model.model.layers.40.mlp.down_proj": {
64
  "bits": 8,
65
  "group_size": 64
66
  },
67
+ "language_model.model.layers.40.mlp.gate_proj": {
68
  "bits": 4,
69
  "group_size": 64
70
  },
71
+ "language_model.model.layers.40.self_attn.o_proj": {
72
  "bits": 4,
73
  "group_size": 64
74
  },
75
+ "language_model.model.layers.40.self_attn.v_proj": {
76
+ "bits": 4,
77
  "group_size": 64
78
  },
79
+ "language_model.model.layers.40.self_attn.k_proj": {
80
+ "bits": 4,
81
  "group_size": 64
82
  },
83
+ "language_model.model.layers.40.self_attn.q_proj": {
84
+ "bits": 4,
85
  "group_size": 64
86
  },
87
+ "language_model.model.layers.39.per_layer_projection": {
88
  "bits": 8,
89
  "group_size": 64
90
  },
91
+ "language_model.model.layers.39.per_layer_input_gate": {
92
+ "bits": 4,
93
  "group_size": 64
94
  },
95
+ "language_model.model.layers.39.mlp.up_proj": {
96
+ "bits": 4,
97
  "group_size": 64
98
  },
99
+ "language_model.model.layers.39.mlp.down_proj": {
100
  "bits": 8,
101
  "group_size": 64
102
  },
103
+ "language_model.model.layers.39.mlp.gate_proj": {
104
+ "bits": 4,
105
  "group_size": 64
106
  },
107
+ "language_model.model.layers.39.self_attn.o_proj": {
108
  "bits": 4,
109
  "group_size": 64
110
  },
111
+ "language_model.model.layers.39.self_attn.v_proj": {
112
+ "bits": 4,
113
  "group_size": 64
114
  },
115
+ "language_model.model.layers.39.self_attn.k_proj": {
116
+ "bits": 4,
117
+ "group_size": 64
118
+ },
119
+ "language_model.model.layers.39.self_attn.q_proj": {
120
+ "bits": 4,
121
  "group_size": 64
122
  },
123
+ "language_model.model.layers.38.per_layer_projection": {
124
  "bits": 8,
125
  "group_size": 64
126
  },
127
+ "language_model.model.layers.38.per_layer_input_gate": {
128
  "bits": 4,
129
  "group_size": 64
130
  },
131
+ "language_model.model.layers.38.mlp.up_proj": {
132
+ "bits": 4,
133
  "group_size": 64
134
  },
135
+ "language_model.model.layers.38.mlp.down_proj": {
136
  "bits": 8,
137
  "group_size": 64
138
  },
139
+ "language_model.model.layers.38.mlp.gate_proj": {
140
+ "bits": 4,
141
  "group_size": 64
142
  },
143
+ "language_model.model.layers.38.self_attn.o_proj": {
144
+ "bits": 8,
145
  "group_size": 64
146
  },
147
+ "language_model.model.layers.38.self_attn.v_proj": {
148
  "bits": 4,
149
  "group_size": 64
150
  },
151
+ "language_model.model.layers.38.self_attn.k_proj": {
152
  "bits": 4,
153
  "group_size": 64
154
  },
155
+ "language_model.model.layers.38.self_attn.q_proj": {
156
+ "bits": 4,
157
  "group_size": 64
158
  },
159
+ "language_model.model.layers.37.per_layer_projection": {
160
  "bits": 8,
161
  "group_size": 64
162
  },
163
+ "language_model.model.layers.37.per_layer_input_gate": {
164
  "bits": 4,
165
  "group_size": 64
166
  },
167
+ "language_model.model.layers.37.mlp.up_proj": {
168
+ "bits": 4,
169
  "group_size": 64
170
  },
171
+ "language_model.model.layers.37.mlp.down_proj": {
172
  "bits": 8,
173
  "group_size": 64
174
  },
175
+ "language_model.model.layers.37.mlp.gate_proj": {
176
+ "bits": 4,
177
  "group_size": 64
178
  },
179
+ "language_model.model.layers.37.self_attn.o_proj": {
180
+ "bits": 4,
181
  "group_size": 64
182
  },
183
+ "language_model.model.layers.37.self_attn.v_proj": {
184
  "bits": 4,
185
  "group_size": 64
186
  },
187
+ "language_model.model.layers.37.self_attn.k_proj": {
188
+ "bits": 4,
189
  "group_size": 64
190
  },
191
+ "language_model.model.layers.37.self_attn.q_proj": {
192
+ "bits": 4,
193
  "group_size": 64
194
  },
195
+ "language_model.model.layers.36.per_layer_projection": {
196
  "bits": 8,
197
  "group_size": 64
198
  },
199
+ "language_model.model.layers.36.per_layer_input_gate": {
200
+ "bits": 4,
201
  "group_size": 64
202
  },
203
+ "language_model.model.layers.36.mlp.up_proj": {
204
+ "bits": 4,
205
  "group_size": 64
206
  },
207
+ "language_model.model.layers.36.mlp.down_proj": {
208
  "bits": 8,
209
  "group_size": 64
210
  },
211
+ "language_model.model.layers.36.mlp.gate_proj": {
212
+ "bits": 4,
213
  "group_size": 64
214
  },
215
+ "language_model.model.layers.36.self_attn.o_proj": {
216
  "bits": 4,
217
  "group_size": 64
218
  },
219
+ "language_model.model.layers.36.self_attn.v_proj": {
220
  "bits": 4,
221
  "group_size": 64
222
  },
223
+ "language_model.model.layers.36.self_attn.k_proj": {
224
  "bits": 4,
225
  "group_size": 64
226
  },
227
+ "language_model.model.layers.36.self_attn.q_proj": {
228
+ "bits": 4,
229
  "group_size": 64
230
  },
231
+ "language_model.model.layers.35.per_layer_projection": {
232
  "bits": 8,
233
  "group_size": 64
234
  },
235
+ "language_model.model.layers.35.per_layer_input_gate": {
236
  "bits": 4,
237
  "group_size": 64
238
  },
239
+ "language_model.model.layers.35.mlp.up_proj": {
240
+ "bits": 4,
241
  "group_size": 64
242
  },
243
+ "language_model.model.layers.35.mlp.down_proj": {
244
+ "bits": 4,
245
  "group_size": 64
246
  },
247
+ "language_model.model.layers.35.mlp.gate_proj": {
248
  "bits": 4,
249
  "group_size": 64
250
  },
251
+ "language_model.model.layers.35.self_attn.o_proj": {
252
+ "bits": 8,
253
  "group_size": 64
254
  },
255
+ "language_model.model.layers.35.self_attn.v_proj": {
256
  "bits": 4,
257
  "group_size": 64
258
  },
259
+ "language_model.model.layers.35.self_attn.k_proj": {
260
  "bits": 4,
261
  "group_size": 64
262
  },
263
+ "language_model.model.layers.35.self_attn.q_proj": {
264
  "bits": 8,
265
  "group_size": 64
266
  },
267
+ "language_model.model.layers.34.per_layer_projection": {
268
  "bits": 8,
269
  "group_size": 64
270
  },
271
+ "language_model.model.layers.34.per_layer_input_gate": {
272
  "bits": 4,
273
  "group_size": 64
274
  },
275
+ "language_model.model.layers.34.mlp.up_proj": {
276
+ "bits": 4,
277
  "group_size": 64
278
  },
279
+ "language_model.model.layers.34.mlp.down_proj": {
280
  "bits": 8,
281
  "group_size": 64
282
  },
283
+ "language_model.model.layers.34.mlp.gate_proj": {
284
  "bits": 4,
285
  "group_size": 64
286
  },
287
+ "language_model.model.layers.34.self_attn.o_proj": {
288
  "bits": 4,
289
  "group_size": 64
290
  },
291
+ "language_model.model.layers.34.self_attn.v_proj": {
292
  "bits": 4,
293
  "group_size": 64
294
  },
295
+ "language_model.model.layers.34.self_attn.k_proj": {
296
  "bits": 4,
297
  "group_size": 64
298
  },
299
+ "language_model.model.layers.34.self_attn.q_proj": {
300
+ "bits": 4,
301
  "group_size": 64
302
  },
303
+ "language_model.model.layers.33.per_layer_projection": {
304
  "bits": 8,
305
  "group_size": 64
306
  },
307
+ "language_model.model.layers.33.per_layer_input_gate": {
308
  "bits": 4,
309
  "group_size": 64
310
  },
311
+ "language_model.model.layers.33.mlp.up_proj": {
312
+ "bits": 4,
313
  "group_size": 64
314
  },
315
+ "language_model.model.layers.33.mlp.down_proj": {
316
  "bits": 8,
317
  "group_size": 64
318
  },
319
+ "language_model.model.layers.33.mlp.gate_proj": {
320
  "bits": 4,
321
  "group_size": 64
322
  },
323
+ "language_model.model.layers.33.self_attn.o_proj": {
324
  "bits": 4,
325
  "group_size": 64
326
  },
327
+ "language_model.model.layers.33.self_attn.v_proj": {
328
  "bits": 4,
329
  "group_size": 64
330
  },
331
+ "language_model.model.layers.33.self_attn.k_proj": {
332
  "bits": 4,
333
  "group_size": 64
334
  },
335
+ "language_model.model.layers.33.self_attn.q_proj": {
336
+ "bits": 4,
337
  "group_size": 64
338
  },
339
+ "language_model.model.layers.32.per_layer_projection": {
340
  "bits": 8,
341
  "group_size": 64
342
  },
343
+ "language_model.model.layers.32.per_layer_input_gate": {
344
  "bits": 4,
345
  "group_size": 64
346
  },
347
+ "language_model.model.layers.32.mlp.up_proj": {
348
  "bits": 4,
349
  "group_size": 64
350
  },
351
+ "language_model.model.layers.32.mlp.down_proj": {
352
  "bits": 8,
353
  "group_size": 64
354
  },
355
+ "language_model.model.layers.32.mlp.gate_proj": {
356
+ "bits": 4,
357
+ "group_size": 64
358
+ },
359
+ "language_model.model.layers.32.self_attn.o_proj": {
360
  "bits": 8,
361
  "group_size": 64
362
  },
363
+ "language_model.model.layers.32.self_attn.v_proj": {
364
  "bits": 4,
365
  "group_size": 64
366
  },
367
+ "language_model.model.layers.32.self_attn.k_proj": {
368
  "bits": 4,
369
  "group_size": 64
370
  },
371
+ "language_model.model.layers.32.self_attn.q_proj": {
372
  "bits": 4,
373
  "group_size": 64
374
  },
375
+ "language_model.model.layers.31.per_layer_projection": {
376
  "bits": 8,
377
  "group_size": 64
378
  },
379
+ "language_model.model.layers.31.per_layer_input_gate": {
380
+ "bits": 4,
381
  "group_size": 64
382
  },
383
+ "language_model.model.layers.31.mlp.up_proj": {
384
  "bits": 4,
385
  "group_size": 64
386
  },
387
+ "language_model.model.layers.31.mlp.down_proj": {
388
  "bits": 8,
389
  "group_size": 64
390
  },
391
+ "language_model.model.layers.31.mlp.gate_proj": {
392
+ "bits": 4,
393
  "group_size": 64
394
  },
395
+ "language_model.model.layers.31.self_attn.o_proj": {
396
+ "bits": 4,
397
  "group_size": 64
398
  },
399
+ "language_model.model.layers.31.self_attn.v_proj": {
400
+ "bits": 4,
401
  "group_size": 64
402
  },
403
+ "language_model.model.layers.31.self_attn.k_proj": {
404
  "bits": 4,
405
  "group_size": 64
406
  },
407
+ "language_model.model.layers.31.self_attn.q_proj": {
408
+ "bits": 4,
409
  "group_size": 64
410
  },
411
+ "language_model.model.layers.30.per_layer_projection": {
412
  "bits": 8,
413
  "group_size": 64
414
  },
415
+ "language_model.model.layers.30.per_layer_input_gate": {
416
+ "bits": 4,
417
  "group_size": 64
418
  },
419
+ "language_model.model.layers.30.mlp.up_proj": {
420
  "bits": 4,
421
  "group_size": 64
422
  },
423
+ "language_model.model.layers.30.mlp.down_proj": {
424
  "bits": 8,
425
  "group_size": 64
426
  },
427
+ "language_model.model.layers.30.mlp.gate_proj": {
428
+ "bits": 4,
429
  "group_size": 64
430
  },
431
+ "language_model.model.layers.30.self_attn.o_proj": {
432
+ "bits": 4,
433
  "group_size": 64
434
  },
435
+ "language_model.model.layers.30.self_attn.v_proj": {
436
  "bits": 4,
437
  "group_size": 64
438
  },
439
+ "language_model.model.layers.30.self_attn.k_proj": {
440
  "bits": 4,
441
  "group_size": 64
442
  },
443
+ "language_model.model.layers.30.self_attn.q_proj": {
444
  "bits": 4,
445
  "group_size": 64
446
  },
447
+ "language_model.model.layers.29.per_layer_projection": {
448
  "bits": 8,
449
  "group_size": 64
450
  },
451
+ "language_model.model.layers.29.per_layer_input_gate": {
452
+ "bits": 4,
453
  "group_size": 64
454
  },
455
+ "language_model.model.layers.29.mlp.up_proj": {
456
  "bits": 4,
457
  "group_size": 64
458
  },
459
+ "language_model.model.layers.29.mlp.down_proj": {
460
  "bits": 4,
461
  "group_size": 64
462
  },
463
+ "language_model.model.layers.29.mlp.gate_proj": {
464
+ "bits": 4,
465
  "group_size": 64
466
  },
467
+ "language_model.model.layers.29.self_attn.o_proj": {
468
  "bits": 8,
469
  "group_size": 64
470
  },
471
+ "language_model.model.layers.29.self_attn.v_proj": {
 
 
 
 
472
  "bits": 4,
473
  "group_size": 64
474
  },
475
+ "language_model.model.layers.29.self_attn.k_proj": {
476
  "bits": 4,
477
  "group_size": 64
478
  },
479
+ "language_model.model.layers.29.self_attn.q_proj": {
480
  "bits": 8,
481
  "group_size": 64
482
  },
483
+ "language_model.model.layers.28.per_layer_projection": {
484
  "bits": 8,
485
  "group_size": 64
486
  },
487
+ "language_model.model.layers.28.per_layer_input_gate": {
488
  "bits": 4,
489
  "group_size": 64
490
  },
491
+ "language_model.model.layers.28.mlp.up_proj": {
492
+ "bits": 4,
493
  "group_size": 64
494
  },
495
+ "language_model.model.layers.28.mlp.down_proj": {
496
+ "bits": 4,
497
  "group_size": 64
498
  },
499
+ "language_model.model.layers.28.mlp.gate_proj": {
500
+ "bits": 4,
501
  "group_size": 64
502
  },
503
+ "language_model.model.layers.28.self_attn.o_proj": {
504
  "bits": 4,
505
  "group_size": 64
506
  },
507
+ "language_model.model.layers.28.self_attn.v_proj": {
508
  "bits": 4,
509
  "group_size": 64
510
  },
511
+ "language_model.model.layers.28.self_attn.k_proj": {
512
  "bits": 4,
513
  "group_size": 64
514
  },
515
+ "language_model.model.layers.28.self_attn.q_proj": {
516
  "bits": 8,
517
  "group_size": 64
518
  },
519
+ "language_model.model.layers.27.per_layer_projection": {
520
  "bits": 8,
521
  "group_size": 64
522
  },
523
+ "language_model.model.layers.27.per_layer_input_gate": {
524
  "bits": 4,
525
  "group_size": 64
526
  },
527
+ "language_model.model.layers.27.mlp.up_proj": {
528
+ "bits": 4,
529
  "group_size": 64
530
  },
531
+ "language_model.model.layers.27.mlp.down_proj": {
532
  "bits": 8,
533
  "group_size": 64
534
  },
535
+ "language_model.model.layers.27.mlp.gate_proj": {
536
+ "bits": 4,
537
  "group_size": 64
538
  },
539
+ "language_model.model.layers.27.self_attn.o_proj": {
540
  "bits": 4,
541
  "group_size": 64
542
  },
543
+ "language_model.model.layers.27.self_attn.v_proj": {
544
  "bits": 4,
545
  "group_size": 64
546
  },
547
+ "language_model.model.layers.27.self_attn.k_proj": {
548
  "bits": 4,
549
  "group_size": 64
550
  },
551
+ "language_model.model.layers.27.self_attn.q_proj": {
552
+ "bits": 4,
553
  "group_size": 64
554
  },
555
+ "language_model.model.layers.26.per_layer_projection": {
556
  "bits": 8,
557
  "group_size": 64
558
  },
559
+ "language_model.model.layers.26.per_layer_input_gate": {
560
  "bits": 4,
561
  "group_size": 64
562
  },
563
+ "language_model.model.layers.26.mlp.up_proj": {
564
  "bits": 8,
565
  "group_size": 64
566
  },
567
+ "language_model.model.layers.26.mlp.down_proj": {
568
+ "bits": 4,
569
  "group_size": 64
570
  },
571
+ "language_model.model.layers.26.mlp.gate_proj": {
572
+ "bits": 4,
573
  "group_size": 64
574
  },
575
+ "language_model.model.layers.26.self_attn.o_proj": {
576
+ "bits": 8,
577
  "group_size": 64
578
  },
579
+ "language_model.model.layers.26.self_attn.v_proj": {
580
  "bits": 4,
581
  "group_size": 64
582
  },
583
+ "language_model.model.layers.26.self_attn.k_proj": {
584
  "bits": 4,
585
  "group_size": 64
586
  },
587
+ "language_model.model.layers.26.self_attn.q_proj": {
588
+ "bits": 4,
589
  "group_size": 64
590
  },
591
+ "language_model.model.layers.25.per_layer_projection": {
592
  "bits": 8,
593
  "group_size": 64
594
  },
595
+ "language_model.model.layers.25.per_layer_input_gate": {
596
+ "bits": 4,
597
  "group_size": 64
598
  },
599
+ "language_model.model.layers.25.mlp.up_proj": {
600
+ "bits": 4,
601
  "group_size": 64
602
  },
603
+ "language_model.model.layers.25.mlp.down_proj": {
604
  "bits": 8,
605
  "group_size": 64
606
  },
607
+ "language_model.model.layers.25.mlp.gate_proj": {
608
+ "bits": 4,
609
  "group_size": 64
610
  },
611
+ "language_model.model.layers.25.self_attn.o_proj": {
612
  "bits": 4,
613
  "group_size": 64
614
  },
615
+ "language_model.model.layers.25.self_attn.v_proj": {
616
  "bits": 4,
617
  "group_size": 64
618
  },
619
+ "language_model.model.layers.25.self_attn.k_proj": {
620
  "bits": 4,
621
  "group_size": 64
622
  },
623
+ "language_model.model.layers.25.self_attn.q_proj": {
624
+ "bits": 4,
625
  "group_size": 64
626
  },
627
+ "language_model.model.layers.24.per_layer_projection": {
628
  "bits": 8,
629
  "group_size": 64
630
  },
631
+ "language_model.model.layers.24.per_layer_input_gate": {
632
  "bits": 4,
633
  "group_size": 64
634
  },
635
+ "language_model.model.layers.24.mlp.up_proj": {
636
+ "bits": 4,
637
  "group_size": 64
638
  },
639
+ "language_model.model.layers.24.mlp.down_proj": {
640
+ "bits": 4,
641
  "group_size": 64
642
  },
643
+ "language_model.model.layers.24.mlp.gate_proj": {
644
  "bits": 8,
645
  "group_size": 64
646
  },
647
+ "language_model.model.layers.24.self_attn.o_proj": {
648
  "bits": 4,
649
  "group_size": 64
650
  },
651
+ "language_model.model.layers.24.self_attn.v_proj": {
652
  "bits": 4,
653
  "group_size": 64
654
  },
655
+ "language_model.model.layers.24.self_attn.k_proj": {
656
  "bits": 4,
657
  "group_size": 64
658
  },
659
+ "language_model.model.layers.24.self_attn.q_proj": {
 
 
 
 
 
 
 
 
660
  "bits": 4,
661
  "group_size": 64
662
  },
663
+ "language_model.model.layers.23.per_layer_projection": {
 
 
 
 
664
  "bits": 8,
665
  "group_size": 64
666
  },
667
+ "language_model.model.layers.23.per_layer_input_gate": {
668
  "bits": 4,
669
  "group_size": 64
670
  },
671
+ "language_model.model.layers.23.mlp.up_proj": {
672
  "bits": 4,
673
  "group_size": 64
674
  },
675
+ "language_model.model.layers.23.mlp.down_proj": {
676
  "bits": 4,
677
  "group_size": 64
678
  },
679
+ "language_model.model.layers.23.mlp.gate_proj": {
680
  "bits": 4,
681
  "group_size": 64
682
  },
683
+ "language_model.model.layers.23.self_attn.o_proj": {
684
  "bits": 8,
685
  "group_size": 64
686
  },
687
+ "language_model.model.layers.23.self_attn.v_proj": {
688
  "bits": 8,
689
  "group_size": 64
690
  },
691
+ "language_model.model.layers.23.self_attn.k_proj": {
692
+ "bits": 8,
693
  "group_size": 64
694
  },
695
+ "language_model.model.layers.23.self_attn.q_proj": {
696
  "bits": 4,
697
  "group_size": 64
698
  },
699
+ "language_model.model.layers.22.per_layer_projection": {
700
  "bits": 8,
701
  "group_size": 64
702
  },
703
+ "language_model.model.layers.22.per_layer_input_gate": {
704
  "bits": 4,
705
  "group_size": 64
706
  },
707
+ "language_model.model.layers.22.mlp.up_proj": {
708
  "bits": 4,
709
  "group_size": 64
710
  },
711
+ "language_model.model.layers.22.mlp.down_proj": {
712
  "bits": 4,
713
  "group_size": 64
714
  },
715
+ "language_model.model.layers.22.mlp.gate_proj": {
716
  "bits": 4,
717
  "group_size": 64
718
  },
719
+ "language_model.model.layers.22.self_attn.o_proj": {
720
+ "bits": 4,
721
  "group_size": 64
722
  },
723
+ "language_model.model.layers.22.self_attn.v_proj": {
724
  "bits": 8,
725
  "group_size": 64
726
  },
727
+ "language_model.model.layers.22.self_attn.k_proj": {
728
+ "bits": 8,
729
  "group_size": 64
730
  },
731
+ "language_model.model.layers.22.self_attn.q_proj": {
732
  "bits": 4,
733
  "group_size": 64
734
  },
735
+ "language_model.model.layers.21.per_layer_projection": {
736
  "bits": 8,
737
  "group_size": 64
738
  },
739
+ "language_model.model.layers.21.per_layer_input_gate": {
740
  "bits": 4,
741
  "group_size": 64
742
  },
743
+ "language_model.model.layers.21.mlp.up_proj": {
744
  "bits": 4,
745
  "group_size": 64
746
  },
747
+ "language_model.model.layers.21.mlp.down_proj": {
748
  "bits": 4,
749
  "group_size": 64
750
  },
751
+ "language_model.model.layers.21.mlp.gate_proj": {
752
  "bits": 4,
753
  "group_size": 64
754
  },
755
+ "language_model.model.layers.21.self_attn.o_proj": {
756
  "bits": 8,
757
  "group_size": 64
758
  },
759
+ "language_model.model.layers.21.self_attn.v_proj": {
760
  "bits": 8,
761
  "group_size": 64
762
  },
763
+ "language_model.model.layers.21.self_attn.k_proj": {
764
  "bits": 4,
765
  "group_size": 64
766
  },
767
+ "language_model.model.layers.21.self_attn.q_proj": {
768
+ "bits": 4,
769
  "group_size": 64
770
  },
771
+ "language_model.model.layers.20.per_layer_projection": {
772
  "bits": 8,
773
  "group_size": 64
774
  },
775
+ "language_model.model.layers.20.per_layer_input_gate": {
776
+ "bits": 4,
777
  "group_size": 64
778
  },
779
+ "language_model.model.layers.20.mlp.up_proj": {
780
  "bits": 4,
781
  "group_size": 64
782
  },
783
+ "language_model.model.layers.20.mlp.down_proj": {
784
+ "bits": 8,
785
  "group_size": 64
786
  },
787
+ "language_model.model.layers.20.mlp.gate_proj": {
788
  "bits": 4,
789
  "group_size": 64
790
  },
791
+ "language_model.model.layers.20.self_attn.o_proj": {
792
  "bits": 8,
793
  "group_size": 64
794
  },
795
+ "language_model.model.layers.20.self_attn.v_proj": {
796
  "bits": 8,
797
  "group_size": 64
798
  },
799
+ "language_model.model.layers.20.self_attn.k_proj": {
800
  "bits": 4,
801
  "group_size": 64
802
  },
803
+ "language_model.model.layers.20.self_attn.q_proj": {
804
+ "bits": 4,
805
  "group_size": 64
806
  },
807
+ "language_model.model.layers.19.per_layer_projection": {
808
  "bits": 8,
809
  "group_size": 64
810
  },
811
+ "language_model.model.layers.19.per_layer_input_gate": {
812
  "bits": 8,
813
  "group_size": 64
814
  },
815
+ "language_model.model.layers.19.mlp.up_proj": {
816
  "bits": 4,
817
  "group_size": 64
818
  },
819
+ "language_model.model.layers.19.mlp.down_proj": {
820
  "bits": 4,
821
  "group_size": 64
822
  },
823
+ "language_model.model.layers.19.mlp.gate_proj": {
824
  "bits": 4,
825
  "group_size": 64
826
  },
827
+ "language_model.model.layers.19.self_attn.o_proj": {
828
  "bits": 8,
829
  "group_size": 64
830
  },
831
+ "language_model.model.layers.19.self_attn.v_proj": {
832
  "bits": 8,
833
  "group_size": 64
834
  },
835
+ "language_model.model.layers.19.self_attn.k_proj": {
836
  "bits": 4,
837
  "group_size": 64
838
  },
839
+ "language_model.model.layers.19.self_attn.q_proj": {
840
+ "bits": 4,
841
  "group_size": 64
842
  },
843
+ "language_model.model.layers.18.per_layer_projection": {
844
  "bits": 8,
845
  "group_size": 64
846
  },
847
+ "language_model.model.layers.18.per_layer_input_gate": {
848
+ "bits": 8,
849
  "group_size": 64
850
  },
851
+ "language_model.model.layers.18.mlp.up_proj": {
852
  "bits": 4,
853
  "group_size": 64
854
  },
855
+ "language_model.model.layers.18.mlp.down_proj": {
856
  "bits": 4,
857
  "group_size": 64
858
  },
859
+ "language_model.model.layers.18.mlp.gate_proj": {
860
  "bits": 4,
861
  "group_size": 64
862
  },
863
+ "language_model.model.layers.18.self_attn.o_proj": {
864
  "bits": 4,
865
  "group_size": 64
866
  },
867
+ "language_model.model.layers.18.self_attn.v_proj": {
868
  "bits": 8,
869
  "group_size": 64
870
  },
871
+ "language_model.model.layers.18.self_attn.k_proj": {
 
 
 
 
 
 
 
 
872
  "bits": 4,
873
  "group_size": 64
874
  },
875
+ "language_model.model.layers.18.self_attn.q_proj": {
876
  "bits": 4,
877
  "group_size": 64
878
  },
879
+ "language_model.model.layers.17.per_layer_projection": {
880
+ "bits": 8,
881
  "group_size": 64
882
  },
883
+ "language_model.model.layers.17.per_layer_input_gate": {
884
  "bits": 4,
885
  "group_size": 64
886
  },
887
+ "language_model.model.layers.17.mlp.up_proj": {
888
  "bits": 8,
889
  "group_size": 64
890
  },
891
+ "language_model.model.layers.17.mlp.down_proj": {
892
  "bits": 4,
893
  "group_size": 64
894
  },
895
+ "language_model.model.layers.17.mlp.gate_proj": {
896
  "bits": 4,
897
  "group_size": 64
898
  },
899
+ "language_model.model.layers.17.self_attn.o_proj": {
900
+ "bits": 8,
901
  "group_size": 64
902
  },
903
+ "language_model.model.layers.17.self_attn.v_proj": {
904
+ "bits": 8,
905
  "group_size": 64
906
  },
907
+ "language_model.model.layers.17.self_attn.k_proj": {
908
+ "bits": 8,
909
  "group_size": 64
910
  },
911
+ "language_model.model.layers.17.self_attn.q_proj": {
912
  "bits": 4,
913
  "group_size": 64
914
  },
915
+ "language_model.model.layers.16.per_layer_projection": {
916
  "bits": 8,
917
  "group_size": 64
918
  },
919
+ "language_model.model.layers.16.per_layer_input_gate": {
920
+ "bits": 8,
921
  "group_size": 64
922
  },
923
+ "language_model.model.layers.16.mlp.up_proj": {
924
  "bits": 4,
925
  "group_size": 64
926
  },
927
+ "language_model.model.layers.16.mlp.down_proj": {
928
  "bits": 4,
929
  "group_size": 64
930
  },
931
+ "language_model.model.layers.16.mlp.gate_proj": {
932
  "bits": 4,
933
  "group_size": 64
934
  },
935
+ "language_model.model.layers.16.self_attn.o_proj": {
936
+ "bits": 8,
937
  "group_size": 64
938
  },
939
+ "language_model.model.layers.16.self_attn.v_proj": {
940
+ "bits": 8,
941
  "group_size": 64
942
  },
943
+ "language_model.model.layers.16.self_attn.k_proj": {
944
  "bits": 8,
945
  "group_size": 64
946
  },
947
+ "language_model.model.layers.16.self_attn.q_proj": {
948
  "bits": 4,
949
  "group_size": 64
950
  },
951
+ "language_model.model.layers.15.per_layer_projection": {
952
+ "bits": 8,
953
  "group_size": 64
954
  },
955
+ "language_model.model.layers.15.per_layer_input_gate": {
956
+ "bits": 8,
957
  "group_size": 64
958
  },
959
+ "language_model.model.layers.15.mlp.up_proj": {
960
  "bits": 4,
961
  "group_size": 64
962
  },
963
+ "language_model.model.layers.15.mlp.down_proj": {
964
  "bits": 4,
965
  "group_size": 64
966
  },
967
+ "language_model.model.layers.15.mlp.gate_proj": {
968
  "bits": 4,
969
  "group_size": 64
970
  },
971
+ "language_model.model.layers.15.self_attn.o_proj": {
972
  "bits": 8,
973
  "group_size": 64
974
  },
975
+ "language_model.model.layers.15.self_attn.v_proj": {
976
+ "bits": 8,
977
  "group_size": 64
978
  },
979
+ "language_model.model.layers.15.self_attn.k_proj": {
980
  "bits": 4,
981
  "group_size": 64
982
  },
983
+ "language_model.model.layers.15.self_attn.q_proj": {
984
  "bits": 4,
985
  "group_size": 64
986
  },
987
+ "language_model.model.layers.14.per_layer_projection": {
988
  "bits": 4,
989
  "group_size": 64
990
  },
991
+ "language_model.model.layers.14.per_layer_input_gate": {
992
+ "bits": 8,
993
  "group_size": 64
994
  },
995
+ "language_model.model.layers.14.mlp.up_proj": {
996
+ "bits": 4,
997
  "group_size": 64
998
  },
999
+ "language_model.model.layers.14.mlp.down_proj": {
1000
  "bits": 8,
1001
  "group_size": 64
1002
  },
1003
+ "language_model.model.layers.14.mlp.gate_proj": {
1004
  "bits": 4,
1005
  "group_size": 64
1006
  },
1007
+ "language_model.model.layers.14.self_attn.o_proj": {
1008
+ "bits": 8,
1009
  "group_size": 64
1010
  },
1011
+ "language_model.model.layers.14.self_attn.v_proj": {
1012
  "bits": 4,
1013
  "group_size": 64
1014
  },
1015
+ "language_model.model.layers.14.self_attn.k_proj": {
1016
  "bits": 4,
1017
  "group_size": 64
1018
  },
1019
+ "language_model.model.layers.14.self_attn.q_proj": {
1020
  "bits": 4,
1021
  "group_size": 64
1022
  },
1023
+ "language_model.model.layers.13.per_layer_projection": {
1024
+ "bits": 8,
1025
  "group_size": 64
1026
  },
1027
+ "language_model.model.layers.13.per_layer_input_gate": {
1028
  "bits": 8,
1029
  "group_size": 64
1030
  },
1031
+ "language_model.model.layers.13.mlp.up_proj": {
1032
  "bits": 4,
1033
  "group_size": 64
1034
  },
1035
+ "language_model.model.layers.13.mlp.down_proj": {
1036
  "bits": 4,
1037
  "group_size": 64
1038
  },
1039
+ "language_model.model.layers.13.mlp.gate_proj": {
1040
  "bits": 4,
1041
  "group_size": 64
1042
  },
1043
+ "language_model.model.layers.13.self_attn.o_proj": {
1044
+ "bits": 8,
1045
  "group_size": 64
1046
  },
1047
+ "language_model.model.layers.13.self_attn.v_proj": {
1048
+ "bits": 8,
1049
+ "group_size": 64
1050
+ },
1051
+ "language_model.model.layers.13.self_attn.k_proj": {
1052
+ "bits": 8,
1053
  "group_size": 64
1054
  },
1055
+ "language_model.model.layers.13.self_attn.q_proj": {
1056
  "bits": 4,
1057
  "group_size": 64
1058
  },
1059
+ "language_model.model.layers.12.per_layer_projection": {
1060
  "bits": 8,
1061
  "group_size": 64
1062
  },
1063
+ "language_model.model.layers.12.per_layer_input_gate": {
1064
+ "bits": 8,
1065
  "group_size": 64
1066
  },
1067
+ "language_model.model.layers.12.mlp.up_proj": {
1068
  "bits": 4,
1069
  "group_size": 64
1070
  },
1071
+ "language_model.model.layers.12.mlp.down_proj": {
1072
  "bits": 4,
1073
  "group_size": 64
1074
  },
1075
+ "language_model.model.layers.12.mlp.gate_proj": {
1076
  "bits": 4,
1077
  "group_size": 64
1078
  },
1079
+ "language_model.model.layers.12.self_attn.o_proj": {
1080
  "bits": 4,
1081
  "group_size": 64
1082
  },
1083
+ "language_model.model.layers.12.self_attn.v_proj": {
1084
+ "bits": 8,
1085
  "group_size": 64
1086
  },
1087
+ "language_model.model.layers.12.self_attn.k_proj": {
1088
  "bits": 8,
1089
  "group_size": 64
1090
  },
1091
+ "language_model.model.layers.12.self_attn.q_proj": {
1092
  "bits": 4,
1093
  "group_size": 64
1094
  },
1095
+ "language_model.model.layers.11.per_layer_projection": {
1096
+ "bits": 8,
1097
  "group_size": 64
1098
  },
1099
+ "language_model.model.layers.11.per_layer_input_gate": {
1100
  "bits": 4,
1101
  "group_size": 64
1102
  },
1103
+ "language_model.model.layers.11.mlp.up_proj": {
1104
  "bits": 4,
1105
  "group_size": 64
1106
  },
1107
+ "language_model.model.layers.11.mlp.down_proj": {
1108
  "bits": 4,
1109
  "group_size": 64
1110
  },
1111
+ "language_model.model.layers.11.mlp.gate_proj": {
1112
  "bits": 4,
1113
  "group_size": 64
1114
  },
1115
+ "language_model.model.layers.11.self_attn.o_proj": {
1116
  "bits": 8,
1117
  "group_size": 64
1118
  },
1119
+ "language_model.model.layers.11.self_attn.v_proj": {
1120
+ "bits": 8,
 
 
 
 
 
 
 
 
1121
  "group_size": 64
1122
  },
1123
+ "language_model.model.layers.11.self_attn.k_proj": {
1124
+ "bits": 8,
1125
  "group_size": 64
1126
  },
1127
+ "language_model.model.layers.11.self_attn.q_proj": {
1128
  "bits": 4,
1129
  "group_size": 64
1130
  },
1131
+ "language_model.model.layers.10.per_layer_projection": {
1132
+ "bits": 8,
1133
  "group_size": 64
1134
  },
1135
+ "language_model.model.layers.10.per_layer_input_gate": {
1136
  "bits": 8,
1137
  "group_size": 64
1138
  },
1139
+ "language_model.model.layers.10.mlp.up_proj": {
1140
  "bits": 4,
1141
  "group_size": 64
1142
  },
1143
+ "language_model.model.layers.10.mlp.down_proj": {
1144
  "bits": 4,
1145
  "group_size": 64
1146
  },
1147
+ "language_model.model.layers.10.mlp.gate_proj": {
1148
  "bits": 4,
1149
  "group_size": 64
1150
  },
1151
+ "language_model.model.layers.10.self_attn.o_proj": {
1152
+ "bits": 8,
1153
  "group_size": 64
1154
  },
1155
+ "language_model.model.layers.10.self_attn.v_proj": {
1156
+ "bits": 8,
1157
  "group_size": 64
1158
  },
1159
+ "language_model.model.layers.10.self_attn.k_proj": {
1160
+ "bits": 8,
1161
  "group_size": 64
1162
  },
1163
+ "language_model.model.layers.10.self_attn.q_proj": {
1164
  "bits": 8,
1165
  "group_size": 64
1166
  },
1167
+ "language_model.model.layers.9.per_layer_projection": {
1168
+ "bits": 8,
1169
  "group_size": 64
1170
  },
1171
+ "language_model.model.layers.9.per_layer_input_gate": {
1172
+ "bits": 8,
1173
  "group_size": 64
1174
  },
1175
+ "language_model.model.layers.9.mlp.up_proj": {
1176
  "bits": 4,
1177
  "group_size": 64
1178
  },
1179
+ "language_model.model.layers.9.mlp.down_proj": {
1180
  "bits": 4,
1181
  "group_size": 64
1182
  },
1183
+ "language_model.model.layers.9.mlp.gate_proj": {
1184
  "bits": 4,
1185
  "group_size": 64
1186
  },
1187
+ "language_model.model.layers.9.self_attn.o_proj": {
1188
  "bits": 4,
1189
  "group_size": 64
1190
  },
1191
+ "language_model.model.layers.9.self_attn.v_proj": {
1192
  "bits": 8,
1193
  "group_size": 64
1194
  },
1195
+ "language_model.model.layers.9.self_attn.k_proj": {
1196
  "bits": 4,
1197
  "group_size": 64
1198
  },
1199
+ "language_model.model.layers.9.self_attn.q_proj": {
1200
+ "bits": 8,
1201
  "group_size": 64
1202
  },
1203
+ "language_model.model.layers.8.per_layer_projection": {
1204
+ "bits": 8,
1205
  "group_size": 64
1206
  },
1207
+ "language_model.model.layers.8.per_layer_input_gate": {
1208
+ "bits": 8,
1209
  "group_size": 64
1210
  },
1211
+ "language_model.model.layers.8.mlp.up_proj": {
1212
+ "bits": 8,
1213
  "group_size": 64
1214
  },
1215
+ "language_model.model.layers.8.mlp.down_proj": {
1216
  "bits": 4,
1217
  "group_size": 64
1218
  },
1219
+ "language_model.model.layers.8.mlp.gate_proj": {
1220
  "bits": 8,
1221
  "group_size": 64
1222
  },
1223
+ "language_model.model.layers.8.self_attn.o_proj": {
1224
  "bits": 4,
1225
  "group_size": 64
1226
  },
1227
+ "language_model.model.layers.8.self_attn.v_proj": {
1228
+ "bits": 8,
1229
  "group_size": 64
1230
  },
1231
+ "language_model.model.layers.8.self_attn.k_proj": {
1232
  "bits": 4,
1233
  "group_size": 64
1234
  },
1235
+ "language_model.model.layers.8.self_attn.q_proj": {
1236
  "bits": 4,
1237
  "group_size": 64
1238
  },
1239
+ "language_model.model.layers.7.per_layer_projection": {
1240
+ "bits": 8,
1241
  "group_size": 64
1242
  },
1243
+ "language_model.model.layers.7.per_layer_input_gate": {
1244
+ "bits": 8,
1245
+ "group_size": 64
1246
+ },
1247
+ "language_model.model.layers.7.mlp.up_proj": {
1248
  "bits": 4,
1249
  "group_size": 64
1250
  },
1251
+ "language_model.model.layers.7.mlp.down_proj": {
1252
  "bits": 8,
1253
  "group_size": 64
1254
  },
1255
+ "language_model.model.layers.7.mlp.gate_proj": {
1256
  "bits": 4,
1257
  "group_size": 64
1258
  },
1259
+ "language_model.model.layers.7.self_attn.o_proj": {
1260
  "bits": 4,
1261
  "group_size": 64
1262
  },
1263
+ "language_model.model.layers.7.self_attn.v_proj": {
1264
+ "bits": 8,
1265
  "group_size": 64
1266
  },
1267
+ "language_model.model.layers.7.self_attn.k_proj": {
1268
+ "bits": 8,
1269
  "group_size": 64
1270
  },
1271
+ "language_model.model.layers.7.self_attn.q_proj": {
1272
  "bits": 4,
1273
  "group_size": 64
1274
  },
1275
+ "language_model.model.layers.6.per_layer_projection": {
1276
  "bits": 4,
1277
  "group_size": 64
1278
  },
1279
+ "language_model.model.layers.6.per_layer_input_gate": {
1280
  "bits": 8,
1281
  "group_size": 64
1282
  },
1283
+ "language_model.model.layers.6.mlp.up_proj": {
1284
+ "bits": 8,
 
 
 
 
1285
  "group_size": 64
1286
  },
1287
+ "language_model.model.layers.6.mlp.down_proj": {
1288
  "bits": 4,
1289
  "group_size": 64
1290
  },
1291
+ "language_model.model.layers.6.mlp.gate_proj": {
1292
+ "bits": 8,
1293
  "group_size": 64
1294
  },
1295
+ "language_model.model.layers.6.self_attn.o_proj": {
1296
  "bits": 4,
1297
  "group_size": 64
1298
  },
1299
+ "language_model.model.layers.6.self_attn.v_proj": {
1300
+ "bits": 8,
1301
  "group_size": 64
1302
  },
1303
+ "language_model.model.layers.6.self_attn.k_proj": {
1304
  "bits": 8,
1305
  "group_size": 64
1306
  },
1307
+ "language_model.model.layers.6.self_attn.q_proj": {
1308
  "bits": 4,
1309
  "group_size": 64
1310
  },
1311
+ "language_model.model.layers.5.per_layer_projection": {
1312
+ "bits": 8,
1313
  "group_size": 64
1314
  },
1315
+ "language_model.model.layers.5.per_layer_input_gate": {
1316
+ "bits": 8,
1317
  "group_size": 64
1318
  },
1319
+ "language_model.model.layers.5.mlp.up_proj": {
1320
  "bits": 4,
1321
  "group_size": 64
1322
  },
1323
+ "language_model.model.layers.5.mlp.down_proj": {
1324
  "bits": 4,
1325
  "group_size": 64
1326
  },
1327
+ "language_model.model.layers.5.mlp.gate_proj": {
1328
  "bits": 4,
1329
  "group_size": 64
1330
  },
1331
+ "language_model.model.layers.5.self_attn.o_proj": {
1332
  "bits": 8,
1333
  "group_size": 64
1334
  },
1335
+ "language_model.model.layers.5.self_attn.v_proj": {
1336
  "bits": 8,
1337
  "group_size": 64
1338
  },
1339
+ "language_model.model.layers.5.self_attn.k_proj": {
1340
  "bits": 8,
1341
  "group_size": 64
1342
  },
1343
+ "language_model.model.layers.5.self_attn.q_proj": {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1344
  "bits": 4,
1345
  "group_size": 64
1346
  },
1347
+ "language_model.model.layers.4.per_layer_projection": {
1348
+ "bits": 8,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1349
  "group_size": 64
1350
  },
1351
+ "language_model.model.layers.4.per_layer_input_gate": {
1352
+ "bits": 8,
1353
  "group_size": 64
1354
  },
1355
+ "language_model.model.layers.4.mlp.up_proj": {
1356
  "bits": 4,
1357
  "group_size": 64
1358
  },
1359
+ "language_model.model.layers.4.mlp.down_proj": {
1360
+ "bits": 8,
1361
  "group_size": 64
1362
  },
1363
+ "language_model.model.layers.4.mlp.gate_proj": {
1364
  "bits": 4,
1365
  "group_size": 64
1366
  },
1367
+ "language_model.model.layers.4.self_attn.o_proj": {
1368
  "bits": 4,
1369
  "group_size": 64
1370
  },
1371
+ "language_model.model.layers.4.self_attn.v_proj": {
1372
+ "bits": 8,
1373
  "group_size": 64
1374
  },
1375
+ "language_model.model.layers.4.self_attn.k_proj": {
1376
+ "bits": 8,
1377
  "group_size": 64
1378
  },
1379
+ "language_model.model.layers.4.self_attn.q_proj": {
1380
+ "bits": 8,
1381
  "group_size": 64
1382
  },
1383
+ "language_model.model.layers.3.per_layer_projection": {
1384
+ "bits": 8,
1385
  "group_size": 64
1386
  },
1387
+ "language_model.model.layers.3.per_layer_input_gate": {
1388
+ "bits": 8,
1389
  "group_size": 64
1390
  },
1391
+ "language_model.model.layers.3.mlp.up_proj": {
1392
  "bits": 4,
1393
  "group_size": 64
1394
  },
1395
+ "language_model.model.layers.3.mlp.down_proj": {
1396
+ "bits": 8,
1397
  "group_size": 64
1398
  },
1399
+ "language_model.model.layers.3.mlp.gate_proj": {
1400
  "bits": 4,
1401
  "group_size": 64
1402
  },
1403
+ "language_model.model.layers.3.self_attn.o_proj": {
1404
+ "bits": 8,
1405
  "group_size": 64
1406
  },
1407
+ "language_model.model.layers.3.self_attn.v_proj": {
1408
+ "bits": 8,
1409
  "group_size": 64
1410
  },
1411
+ "language_model.model.layers.3.self_attn.k_proj": {
1412
  "bits": 4,
1413
  "group_size": 64
1414
  },
1415
+ "language_model.model.layers.3.self_attn.q_proj": {
1416
  "bits": 4,
1417
  "group_size": 64
1418
  },
1419
+ "language_model.model.layers.2.per_layer_projection": {
1420
  "bits": 4,
1421
  "group_size": 64
1422
  },
1423
+ "language_model.model.layers.2.per_layer_input_gate": {
1424
+ "bits": 8,
1425
  "group_size": 64
1426
  },
1427
+ "language_model.model.layers.2.mlp.up_proj": {
1428
  "bits": 4,
1429
  "group_size": 64
1430
  },
1431
+ "language_model.model.layers.2.mlp.down_proj": {
1432
  "bits": 4,
1433
  "group_size": 64
1434
  },
1435
+ "language_model.model.layers.2.mlp.gate_proj": {
1436
+ "bits": 8,
1437
  "group_size": 64
1438
  },
1439
+ "language_model.model.layers.2.self_attn.o_proj": {
1440
  "bits": 4,
1441
  "group_size": 64
1442
  },
1443
+ "language_model.model.layers.2.self_attn.v_proj": {
1444
+ "bits": 8,
1445
  "group_size": 64
1446
  },
1447
+ "language_model.model.layers.2.self_attn.k_proj": {
1448
  "bits": 4,
1449
  "group_size": 64
1450
  },
1451
+ "language_model.model.layers.2.self_attn.q_proj": {
1452
+ "bits": 8,
1453
  "group_size": 64
1454
  },
1455
+ "language_model.model.layers.1.per_layer_projection": {
1456
  "bits": 4,
1457
  "group_size": 64
1458
  },
1459
+ "language_model.model.layers.1.per_layer_input_gate": {
1460
+ "bits": 8,
1461
  "group_size": 64
1462
  },
1463
+ "language_model.model.layers.1.mlp.up_proj": {
1464
  "bits": 4,
1465
  "group_size": 64
1466
  },
1467
+ "language_model.model.layers.1.mlp.down_proj": {
1468
+ "bits": 8,
1469
  "group_size": 64
1470
  },
1471
+ "language_model.model.layers.1.mlp.gate_proj": {
1472
  "bits": 4,
1473
  "group_size": 64
1474
  },
1475
+ "language_model.model.layers.1.self_attn.o_proj": {
1476
+ "bits": 8,
1477
  "group_size": 64
1478
  },
1479
+ "language_model.model.layers.1.self_attn.v_proj": {
1480
+ "bits": 8,
1481
  "group_size": 64
1482
  },
1483
+ "language_model.model.layers.1.self_attn.k_proj": {
1484
+ "bits": 8,
1485
  "group_size": 64
1486
  },
1487
+ "language_model.model.layers.1.self_attn.q_proj": {
1488
  "bits": 4,
1489
  "group_size": 64
1490
  },
1491
+ "language_model.model.layers.0.per_layer_projection": {
1492
+ "bits": 8,
1493
  "group_size": 64
1494
  },
1495
+ "language_model.model.layers.0.per_layer_input_gate": {
1496
+ "bits": 8,
1497
  "group_size": 64
1498
  },
1499
+ "language_model.model.layers.0.mlp.up_proj": {
1500
  "bits": 4,
1501
  "group_size": 64
1502
  },
1503
+ "language_model.model.layers.0.mlp.down_proj": {
1504
+ "bits": 8,
1505
  "group_size": 64
1506
  },
1507
+ "language_model.model.layers.0.mlp.gate_proj": {
1508
+ "bits": 8,
1509
  "group_size": 64
1510
  },
1511
+ "language_model.model.layers.0.self_attn.o_proj": {
1512
+ "bits": 8,
1513
  "group_size": 64
1514
  },
1515
+ "language_model.model.layers.0.self_attn.v_proj": {
1516
+ "bits": 8,
1517
  "group_size": 64
1518
  },
1519
+ "language_model.model.layers.0.self_attn.k_proj": {
1520
+ "bits": 8,
1521
  "group_size": 64
1522
  },
1523
+ "language_model.model.layers.0.self_attn.q_proj": {
1524
+ "bits": 8,
1525
  "group_size": 64
1526
  }
1527
  },
1528
  "post_processing": [
1529
  {
1530
+ "op": "strip_multimodal_metadata",
1531
  "architectures": {
1532
  "from": [
1533
  "Gemma4ForConditionalGeneration"
 
1536
  "Gemma4ForCausalLM"
1537
  ]
1538
  },
1539
+ "model_type": null,
1540
  "flattened_text_config": false,
1541
  "dropped_keys": [
1542
  "audio_config",
 
1549
  "image_token_id",
1550
  "video_token_id",
1551
  "vision_soft_tokens_per_image"
1552
+ ],
1553
+ "dropped_mrope_keys": []
 
 
 
 
 
1554
  }
1555
  ]
1556
  }