Add README.md
Browse files
README.md
CHANGED
|
@@ -1,12 +1,9 @@
|
|
| 1 |
---
|
| 2 |
license: llama2
|
| 3 |
-
datasets:
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
metrics:
|
| 8 |
-
- perplexity
|
| 9 |
-
- accuracy
|
| 10 |
base_model:
|
| 11 |
- meta-llama/Llama-2-13b-hf
|
| 12 |
pipeline_tag: text-generation
|
|
@@ -45,10 +42,10 @@ from transformers import AutoModel
|
|
| 45 |
|
| 46 |
model = AutoModel.from_pretrained("MerantixMomentum/acip_llama2_13b", trust_remote_code=True)
|
| 47 |
```
|
| 48 |
-
This will download and create a fully parameterized ACIP model that can be pruned to any compression
|
| 49 |
For example,
|
| 50 |
```python
|
| 51 |
-
model.prune_model_by_score(
|
| 52 |
```
|
| 53 |
will prune `model` to 40% if its original size measured in number of parameters, i.e., 60% compression rate.
|
| 54 |
A unique feature of ACIP is that this operation is revertible in the sense that you can rerun `model.prune_model_by_score` as often as you like to evaluate your model at different sizes. Finally, you can "commit" to a certain ratio and run
|
|
@@ -65,7 +62,7 @@ to save even more memory (we have only tested 4bit quantization with `bitsandbyt
|
|
| 65 |
|
| 66 |
**🚀 That's it! You can now use your compressed model for inference or fine-tuning as any other Causal Language Model from 🤗 transformers.**
|
| 67 |
|
| 68 |
-
**Note**: The parameter `
|
| 69 |
|
| 70 |
# Dependencies
|
| 71 |
|
|
|
|
| 1 |
---
|
| 2 |
license: llama2
|
| 3 |
+
datasets: ['allenai/c4']
|
| 4 |
+
language: ['en']
|
| 5 |
+
metrics: ['perplexity', 'accuracy']
|
| 6 |
+
tags: ['acip', 'pytorch']
|
|
|
|
|
|
|
|
|
|
| 7 |
base_model:
|
| 8 |
- meta-llama/Llama-2-13b-hf
|
| 9 |
pipeline_tag: text-generation
|
|
|
|
| 42 |
|
| 43 |
model = AutoModel.from_pretrained("MerantixMomentum/acip_llama2_13b", trust_remote_code=True)
|
| 44 |
```
|
| 45 |
+
This will download and create a fully parameterized ACIP model that can be pruned to any compression rate you wish.
|
| 46 |
For example,
|
| 47 |
```python
|
| 48 |
+
model.prune_model_by_score(size_ratio=0.4)
|
| 49 |
```
|
| 50 |
will prune `model` to 40% if its original size measured in number of parameters, i.e., 60% compression rate.
|
| 51 |
A unique feature of ACIP is that this operation is revertible in the sense that you can rerun `model.prune_model_by_score` as often as you like to evaluate your model at different sizes. Finally, you can "commit" to a certain ratio and run
|
|
|
|
| 62 |
|
| 63 |
**🚀 That's it! You can now use your compressed model for inference or fine-tuning as any other Causal Language Model from 🤗 transformers.**
|
| 64 |
|
| 65 |
+
**Note**: The parameter `size_ratio` ranges from 1.0 to 0.0, indicating the model size after compression. For example, 0.4 means that the model has only 40% of the original number of parameters and 1.0 means no compression at all. Alternatively, you can also set `compression_rate` in `prune_model_by_score`, which is equivalent to `size_ratio = 1.0 - compression_rate`.
|
| 66 |
|
| 67 |
# Dependencies
|
| 68 |
|