Update README.md
Browse files
README.md
CHANGED
|
@@ -270,9 +270,14 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
| 270 |
|
| 271 |
2. Run server
|
| 272 |
```bash
|
| 273 |
-
vllm serve openthaigpt/openthaigpt1.5-72b-instruct --tensor-parallel-size 4
|
| 274 |
```
|
| 275 |
* Note, change ``--tensor-parallel-size 4`` to the amount of available GPU cards.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 276 |
|
| 277 |
3. Run inference (CURL example)
|
| 278 |
```bash
|
|
|
|
| 270 |
|
| 271 |
2. Run server
|
| 272 |
```bash
|
| 273 |
+
vllm serve openthaigpt/openthaigpt1.5-72b-instruct --tensor-parallel-size 4
|
| 274 |
```
|
| 275 |
* Note, change ``--tensor-parallel-size 4`` to the amount of available GPU cards.
|
| 276 |
+
|
| 277 |
+
If you wish to enable tool calling feature, add ``--enable-auto-tool-choice --tool-call-parser hermes`` into command. e.g.,
|
| 278 |
+
```bash
|
| 279 |
+
vllm serve openthaigpt/openthaigpt1.5-72b-instruct --tensor-parallel-size 4 --enable-auto-tool-choice --tool-call-parser hermes
|
| 280 |
+
```
|
| 281 |
|
| 282 |
3. Run inference (CURL example)
|
| 283 |
```bash
|