Update README.md
Browse files
README.md
CHANGED
|
@@ -67,6 +67,21 @@ print(tokenizer.decode(outputs[0]))
|
|
| 67 |
|
| 68 |
For local inference, you can use `llama.cpp`, `ONNX`, `MLX` and `MLC`. You can find quantized checkpoints in this collection (https://huggingface.co/collections/HuggingFaceTB/smollm3-686d33c1fdffe8e635317e23).
|
| 69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
## Evaluation
|
| 71 |
|
| 72 |
In this section, we report the evaluation results of SmolLM3 model. All evaluations are zero-shot unless stated otherwise, and we use [lighteval](https://github.com/huggingface/lighteval) to run them.
|
|
|
|
| 67 |
|
| 68 |
For local inference, you can use `llama.cpp`, `ONNX`, `MLX` and `MLC`. You can find quantized checkpoints in this collection (https://huggingface.co/collections/HuggingFaceTB/smollm3-686d33c1fdffe8e635317e23).
|
| 69 |
|
| 70 |
+
### Long context processing
|
| 71 |
+
|
| 72 |
+
The current `config.json` is set for context length up to 65,536 tokens. To handle longer inputs (128k or 256k), we utilize YaRN you can change the `max_position_embeddings` and rope_scaling` to:
|
| 73 |
+
```
|
| 74 |
+
{
|
| 75 |
+
...,
|
| 76 |
+
"rope_scaling": {
|
| 77 |
+
"factor": 2.0, #2x65536=131 072
|
| 78 |
+
"original_max_position_embeddings": 65536,
|
| 79 |
+
"type": "yarn"
|
| 80 |
+
}
|
| 81 |
+
}
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
|
| 85 |
## Evaluation
|
| 86 |
|
| 87 |
In this section, we report the evaluation results of SmolLM3 model. All evaluations are zero-shot unless stated otherwise, and we use [lighteval](https://github.com/huggingface/lighteval) to run them.
|