Update README.md
Browse files
README.md
CHANGED
|
@@ -474,6 +474,7 @@ The code used to generate the dataset can be found [here](https://github.com/pre
|
|
| 474 |
- Given a conversation, we extract all tuples `(context_messages, function_calls)` and use it to generate predictions. We ignore the `content` field and only evaluate `function_calls` generated by an LLM.
|
| 475 |
- We use vLLM deployment with `tool_choice="auto"`.
|
| 476 |
|
|
|
|
| 477 |
## Metrics
|
| 478 |
|
| 479 |
Given a list of predicted and reference function calls, we report two metrics:
|
|
@@ -482,6 +483,9 @@ Given a list of predicted and reference function calls, we report two metrics:
|
|
| 482 |
|
| 483 |
EM is a strict metric, and penalizes string arguments in function calls that may be "okay", e.g. `"email_content": "This is an example."` v/s `"email_content": "This is an Example."`, both only differ by one letter.
|
| 484 |
|
|
|
|
|
|
|
|
|
|
| 485 |
|
| 486 |
# Quickstart
|
| 487 |
|
|
|
|
| 474 |
- Given a conversation, we extract all tuples `(context_messages, function_calls)` and use it to generate predictions. We ignore the `content` field and only evaluate `function_calls` generated by an LLM.
|
| 475 |
- We use vLLM deployment with `tool_choice="auto"`.
|
| 476 |
|
| 477 |
+
|
| 478 |
## Metrics
|
| 479 |
|
| 480 |
Given a list of predicted and reference function calls, we report two metrics:
|
|
|
|
| 483 |
|
| 484 |
EM is a strict metric, and penalizes string arguments in function calls that may be "okay", e.g. `"email_content": "This is an example."` v/s `"email_content": "This is an Example."`, both only differ by one letter.
|
| 485 |
|
| 486 |
+
## Deployment with vLLM
|
| 487 |
+
|
| 488 |
+
`vllm serve ojus1/Qwen3-1.7B-Instruct --enable-lora --lora-modules prem-research/Funcdex-1.7B=prem-research/Funcdex-1.7B --enable-auto-tool-choice --tool-call-parser hermes`
|
| 489 |
|
| 490 |
# Quickstart
|
| 491 |
|