cmpatino
commited on
Commit
·
1b310b4
1
Parent(s):
6e4efe2
Add examples for using the model
Browse files
README.md
CHANGED
|
@@ -43,7 +43,8 @@ The model is a decoder-only transformer using GQA and NoRope, it was pretrained
|
|
| 43 |
|
| 44 |
For more details refer to our blog post: TODO
|
| 45 |
|
| 46 |
-
|
|
|
|
| 47 |
The modeling code for SmolLM3 is available in transformers `v4.53.0`, so make sure to upgrade your transformers version. You can also load the model with the latest `vllm` which uses transformers as a backend.
|
| 48 |
```bash
|
| 49 |
pip install -U transformers
|
|
@@ -52,16 +53,128 @@ pip install -U transformers
|
|
| 52 |
```python
|
| 53 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
checkpoint = "HuggingFaceTB/SmolLM3-3B"
|
| 56 |
-
|
| 57 |
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
outputs = model.generate(inputs)
|
| 62 |
print(tokenizer.decode(outputs[0]))
|
| 63 |
```
|
| 64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
For local inference, you can use `llama.cpp`, `ONNX`, `MLX` and `MLC`. You can find quantized checkpoints in this collection [TODO].
|
| 66 |
|
| 67 |
## Evaluation
|
|
@@ -188,44 +301,6 @@ Here is an infographic with all the training details [TODO].
|
|
| 188 |
- The datasets used for pretraining can be found in this [collection](https://huggingface.co/collections/HuggingFaceTB/smollm3-pretraining-datasets-685a7353fdc01aecde51b1d9) and those used in mid-training and pos-training can be found here [TODO]
|
| 189 |
- The training and evaluation configs and code can be found in the [huggingface/smollm](https://github.com/huggingface/smollm) repository.
|
| 190 |
|
| 191 |
-
## Agentic Usage
|
| 192 |
-
|
| 193 |
-
SmolLM3 supports tool calling! Just pass your list of tools under the argument `xml_tools` (for standard tool-calling), or `python_tools` (for calling tools like python functions in a <code> snippet).
|
| 194 |
-
|
| 195 |
-
```python
|
| 196 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 197 |
-
|
| 198 |
-
checkpoint = "HuggingFaceTB/SmolLM3-3B"
|
| 199 |
-
|
| 200 |
-
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
| 201 |
-
model = AutoModelForCausalLM.from_pretrained(checkpoint)
|
| 202 |
-
|
| 203 |
-
tools = [
|
| 204 |
-
{
|
| 205 |
-
"name": "get_weather",
|
| 206 |
-
"description": "Get the weather in a city",
|
| 207 |
-
"parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "The city to get the weather for"}}}}
|
| 208 |
-
]
|
| 209 |
-
|
| 210 |
-
messages = [
|
| 211 |
-
{
|
| 212 |
-
"role": "user",
|
| 213 |
-
"content": "Hello! How is the weather today in Copenhagen?"
|
| 214 |
-
}
|
| 215 |
-
]
|
| 216 |
-
|
| 217 |
-
inputs = tokenizer.apply_chat_template(
|
| 218 |
-
messages,
|
| 219 |
-
enable_thinking=False, # True works as well, your choice!
|
| 220 |
-
xml_tools=tools,
|
| 221 |
-
add_generation_prompt=True,
|
| 222 |
-
tokenize=True,
|
| 223 |
-
return_tensors="pt"
|
| 224 |
-
)
|
| 225 |
-
|
| 226 |
-
outputs = model.generate(inputs)
|
| 227 |
-
print(tokenizer.decode(outputs[0]))
|
| 228 |
-
```
|
| 229 |
|
| 230 |
## Limitations
|
| 231 |
|
|
|
|
| 43 |
|
| 44 |
For more details refer to our blog post: TODO
|
| 45 |
|
| 46 |
+
## How to use
|
| 47 |
+
|
| 48 |
The modeling code for SmolLM3 is available in transformers `v4.53.0`, so make sure to upgrade your transformers version. You can also load the model with the latest `vllm` which uses transformers as a backend.
|
| 49 |
```bash
|
| 50 |
pip install -U transformers
|
|
|
|
| 53 |
```python
|
| 54 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 55 |
|
| 56 |
+
model_name = "HuggingFaceTB/SmolLM3-3B"
|
| 57 |
+
device = "cuda" # for GPU usage or "cpu" for CPU usage
|
| 58 |
+
|
| 59 |
+
# load the tokenizer and the model
|
| 60 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 61 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 62 |
+
model_name,
|
| 63 |
+
).to(device)
|
| 64 |
+
|
| 65 |
+
# prepare the model input
|
| 66 |
+
prompt = "Give me a brief explanation of gravity in simple terms."
|
| 67 |
+
messages_think = [
|
| 68 |
+
{"role": "user", "content": prompt}
|
| 69 |
+
]
|
| 70 |
+
|
| 71 |
+
text = tokenizer.apply_chat_template(
|
| 72 |
+
messages_think,
|
| 73 |
+
tokenize=False,
|
| 74 |
+
add_generation_prompt=True,
|
| 75 |
+
)
|
| 76 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
| 77 |
+
|
| 78 |
+
# Generate the output
|
| 79 |
+
generated_ids = model.generate(**model_inputs, max_new_tokens=32768)
|
| 80 |
+
|
| 81 |
+
# Get and decode the output
|
| 82 |
+
output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
|
| 83 |
+
print(tokenizer.decode(output_ids, skip_special_tokens=True))
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
### Enabling and Disabling Extended Thinking Mode
|
| 88 |
+
|
| 89 |
+
We enable extended thinking by default, so the example above generates the output with a reasoning trace. For choosing between enabling, you can provide the `/think` and `/no_think` flags through the system prompt as shown in the snippet below for extended thinking disabled. The code for generating the response with extended thinking would be the same except that the system prompt should have `/think` instead of `/no_think`.
|
| 90 |
+
|
| 91 |
+
```python
|
| 92 |
+
prompt = "Give me a brief explanation of gravity in simple terms."
|
| 93 |
+
messages = [
|
| 94 |
+
{"role": "system", "content": "/no_think"},
|
| 95 |
+
{"role": "user", "content": prompt}
|
| 96 |
+
]
|
| 97 |
+
|
| 98 |
+
text = tokenizer.apply_chat_template(
|
| 99 |
+
messages,
|
| 100 |
+
tokenize=False,
|
| 101 |
+
add_generation_prompt=True,
|
| 102 |
+
)
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
We also provide the option of specifying the whether to use extended thinking through the `enable_thinking` kwarg as in the example below. You do not need to set the `/no_think` or `/think` flags through the system prompt if using the kwarg, but keep in mind that the flag in the system prompt overwrites the setting in the kwarg.
|
| 106 |
+
|
| 107 |
+
```python
|
| 108 |
+
prompt = "Give me a brief explanation of gravity in simple terms."
|
| 109 |
+
messages = [
|
| 110 |
+
{"role": "user", "content": prompt}
|
| 111 |
+
]
|
| 112 |
+
|
| 113 |
+
text = tokenizer.apply_chat_template(
|
| 114 |
+
messages,
|
| 115 |
+
tokenize=False,
|
| 116 |
+
add_generation_prompt=True,
|
| 117 |
+
enable_thinking=False
|
| 118 |
+
)
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
### Agentic Usage
|
| 122 |
+
|
| 123 |
+
SmolLM3 supports tool calling! Just pass your list of tools under the argument `xml_tools` (for standard tool-calling), or `python_tools` (for calling tools like python functions in a <code> snippet).
|
| 124 |
+
|
| 125 |
+
```python
|
| 126 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 127 |
+
|
| 128 |
checkpoint = "HuggingFaceTB/SmolLM3-3B"
|
| 129 |
+
|
| 130 |
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
| 131 |
+
model = AutoModelForCausalLM.from_pretrained(checkpoint)
|
| 132 |
+
|
| 133 |
+
tools = [
|
| 134 |
+
{
|
| 135 |
+
"name": "get_weather",
|
| 136 |
+
"description": "Get the weather in a city",
|
| 137 |
+
"parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "The city to get the weather for"}}}}
|
| 138 |
+
]
|
| 139 |
+
|
| 140 |
+
messages = [
|
| 141 |
+
{
|
| 142 |
+
"role": "user",
|
| 143 |
+
"content": "Hello! How is the weather today in Copenhagen?"
|
| 144 |
+
}
|
| 145 |
+
]
|
| 146 |
+
|
| 147 |
+
inputs = tokenizer.apply_chat_template(
|
| 148 |
+
messages,
|
| 149 |
+
enable_thinking=False, # True works as well, your choice!
|
| 150 |
+
xml_tools=tools,
|
| 151 |
+
add_generation_prompt=True,
|
| 152 |
+
tokenize=True,
|
| 153 |
+
return_tensors="pt"
|
| 154 |
+
)
|
| 155 |
+
|
| 156 |
outputs = model.generate(inputs)
|
| 157 |
print(tokenizer.decode(outputs[0]))
|
| 158 |
```
|
| 159 |
|
| 160 |
+
### Using Custom System Instructions.
|
| 161 |
+
|
| 162 |
+
You can specify custom instruction through the system prompt while controlling whether to use extended thinking. For example, the snippet below shows how to make the model speak like a pirate while enabling extended thinking.
|
| 163 |
+
|
| 164 |
+
```python
|
| 165 |
+
prompt = "Give me a brief explanation of gravity in simple terms."
|
| 166 |
+
messages = [
|
| 167 |
+
{"role": "system", "content": "Speak like a pirate. /think"},
|
| 168 |
+
{"role": "user", "content": prompt}
|
| 169 |
+
]
|
| 170 |
+
|
| 171 |
+
text = tokenizer.apply_chat_template(
|
| 172 |
+
messages,
|
| 173 |
+
tokenize=False,
|
| 174 |
+
add_generation_prompt=True,
|
| 175 |
+
)
|
| 176 |
+
```
|
| 177 |
+
|
| 178 |
For local inference, you can use `llama.cpp`, `ONNX`, `MLX` and `MLC`. You can find quantized checkpoints in this collection [TODO].
|
| 179 |
|
| 180 |
## Evaluation
|
|
|
|
| 301 |
- The datasets used for pretraining can be found in this [collection](https://huggingface.co/collections/HuggingFaceTB/smollm3-pretraining-datasets-685a7353fdc01aecde51b1d9) and those used in mid-training and pos-training can be found here [TODO]
|
| 302 |
- The training and evaluation configs and code can be found in the [huggingface/smollm](https://github.com/huggingface/smollm) repository.
|
| 303 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 304 |
|
| 305 |
## Limitations
|
| 306 |
|