ojus1 commited on
Commit
12a753e
Β·
verified Β·
1 Parent(s): c48be4b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +571 -195
README.md CHANGED
@@ -1,199 +1,575 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a πŸ€— transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
+ license: mit
4
+ task_categories:
5
+ - text-generation
6
+ language:
7
+ - en
8
+ tags:
9
+ - agent
10
+ - Agentic Learning
11
+ - tool use
12
+ - BFCL
13
  ---
14
 
15
+ [![Funcdex-Collection](https://img.shields.io/badge/Hugging%20Face-Model-yellow?logo=huggingface)](https://huggingface.co/collections/prem-research/funcdex) [![Dataset](https://img.shields.io/badge/Hugging%20Face-Dataset-yellow?logo=huggingface)](https://huggingface.co/datasets/prem-research/Funcdex-MT-Function-Calling) [![GitHub](https://img.shields.io/badge/GitHub-Code-181717?logo=github)](https://github.com/prem-research/Funcdex-Synthesizer) [![PremAI](https://img.shields.io/badge/Project-AWorld-green)](https://www.premai.io/)
16
+
17
+ # Funcdex-1.7B
18
+
19
+ <div align="center">
20
+ <img src="assets/funcdex_hero.png" alt="Funcdex Hero" width="40%">
21
+ </div>
22
+
23
+ Funcdex-1.7B is a research preview model by Prem Labs. It has been trained on a mix of [Funcdex-MT-Function-Calling](https://huggingface.co/datasets/prem-research/Funcdex-MT-Function-Calling), Instruct-Following, Single-turn function datasets. It is a LoRA finetune of Qwen3-1.7B (with thinking disabled).
24
+
25
+ This model excels at Multi-turn Function Calling with tools from `gmail`, `jira`, `calendar`, `docs`, etc.
26
+
27
+ The code used to generate the dataset can be found [here](https://github.com/prem-research/Funcdex-Synthesizer).
28
+
29
+
30
+ # Quickstart
31
+
32
+ ```python
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer
34
+ from peft import PeftModel
35
+ import torch
36
+ import json
37
+
38
+ # Load model and tokenizer
39
+ base_model_name = "ojus1/Qwen3-1.7B-Instruct"
40
+ model_name = "prem-research/Funcdex-1.7B"
41
+
42
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
43
+
44
+ base_model = AutoModelForCausalLM.from_pretrained(
45
+ base_model_name,
46
+ torch_dtype="auto",
47
+ device_map="auto"
48
+ )
49
+
50
+ model = PeftModel.from_pretrained(
51
+ base_model,
52
+ model_name,
53
+ torch_dtype="auto",
54
+ device_map="auto"
55
+ )
56
+
57
+ # Define tools (supports all toolkits)
58
+ tools = [
59
+ {
60
+ "type": "function",
61
+ "function": {
62
+ "name": "CREATE_SHARED_DRIVE",
63
+ "description": "Create a new shared drive in Google Drive",
64
+ "parameters": {
65
+ "type": "object",
66
+ "properties": {
67
+ "name": {"type": "string", "description": "Name of the shared drive"},
68
+ "requestId": {"type": "string", "description": "Unique request ID"}
69
+ },
70
+ "required": ["name", "requestId"]
71
+ }
72
+ }
73
+ },
74
+ {
75
+ "type": "function",
76
+ "function": {
77
+ "name": "CREATE_A_FOLDER",
78
+ "description": "Create a folder in Google Drive",
79
+ "parameters": {
80
+ "type": "object",
81
+ "properties": {
82
+ "folder_name": {"type": "string", "description": "Name of the folder"},
83
+ "parent_id": {"type": "string", "description": "Parent drive or folder ID"}
84
+ },
85
+ "required": ["folder_name", "parent_id"]
86
+ }
87
+ }
88
+ }
89
+ ]
90
+
91
+ # Define conversation
92
+ messages = [
93
+ {"role": "system", "content": "You are a helpful assistant that can help with tasks by using tools."},
94
+ {"role": "user", "content": "Create a shared drive named 'Partner-Alpha-Integration' with request ID 'req-12345'."}
95
+ ]
96
+
97
+ # Apply chat template with tools
98
+ formatted_input = tokenizer.apply_chat_template(
99
+ messages,
100
+ tools=tools,
101
+ tokenize=False,
102
+ add_generation_prompt=True
103
+ )
104
+
105
+ # Tokenize and generate
106
+ input_tokens = tokenizer(formatted_input, return_tensors="pt").to(model.device)
107
+ output = model.generate(**input_tokens, max_new_tokens=256, do_sample=False)
108
+ response = tokenizer.decode(output[0][input_tokens['input_ids'].shape[1]:], skip_special_tokens=True)
109
+
110
+ print("Response:", response)
111
+ # Expected output includes: <tool_call>{"name": "CREATE_SHARED_DRIVE", "arguments": {"name": "Partner-Alpha-Integration", "requestId": "req-12345"}}</tool_call>
112
+ ```
113
+
114
+ For best results, provide detailed system-prompt to steer the tool-use behaviour.
115
+
116
+ # Evaluation
117
+
118
+
119
+ <div align="center">
120
+ <img src="assets/line_plot.png" alt="Line Plot" width="40%">
121
+ </div>
122
+
123
+
124
+ ## Inference
125
+
126
+ - Given a conversation, we extract all tuples `(context_messages, function_calls)` and use it to generate predictions. We ignore the `content` field and only evaluate `function_calls` generated by an LLM.
127
+ - We use vLLM deployment with `tool_choice="auto"`.
128
+
129
+ ## Metrics
130
+
131
+ Given a list of predicted and reference function calls, we report two metrics:
132
+ - **Function Call String Match (SR)**: We perform greedy match and report best-matched string ratio using `difflib.SequenceMatcher.ratio`. The number reported is average string ratio.
133
+ - **Exact Match (EM)**: Same as above, but we perform exact string match instead. The number reported is EM F1 Score.
134
+
135
+ EM is a strict metric, and penalizes string arguments in function calls that may be "okay", e.g. `"email_content": "This is an example."` v/s `"email_content": "This is an Example."`, both only differ by one letter.
136
+
137
+ ## Results
138
+
139
+ ### BFCL v3
140
+ - We filtered BFCLv3 examples relevant to the toolkits/bundles and report performance:
141
+ - The filtered set is only 83 examples. Further emphasizing the need for workflow/toolkit-specialized workflows.
142
+
143
+ <table border="1" class="dataframe">
144
+ <thead>
145
+ <tr style="text-align: center;">
146
+ <th>LLM</th>
147
+ <th>Acc %</th>
148
+ </tr>
149
+ </thead>
150
+ <tbody>
151
+ <tr style="text-align: center;">
152
+ <td>GPT-5 Mini<br>(medium)</td>
153
+ <td>0.71</td>
154
+ </tr>
155
+ <tr style="text-align: center;">
156
+ <td>Qwen3-1.7B</td>
157
+ <td>0.82</td>
158
+ </tr>
159
+ <tr style="text-align: center;">
160
+ <td><strong><a href="https://huggingface.co/prem-research/Funcdex-1.7B">Funcdex-1.7B</a><strong></td>
161
+ <td><strong>0.86</strong></td>
162
+ </tr>
163
+ </tbody>
164
+ </table>
165
+
166
+
167
+ ### Funcdex-MT: Overall Performance
168
+
169
+ <table border="1" class="dataframe">
170
+ <thead>
171
+ <tr style="text-align: center;">
172
+ <th>LLM</th>
173
+ <th>Exact Match</th>
174
+ <th>String Ratio</th>
175
+ <th>Total Cost ($)</th>
176
+ </tr>
177
+ </thead>
178
+ <tbody>
179
+ <tr style="text-align: center;">
180
+ <td>GPT-OSS-120B<br>(medium)</td>
181
+ <td>0.35</td>
182
+ <td>0.51</td>
183
+ <td>9.32</td>
184
+ </tr>
185
+ <tr style="text-align: center;">
186
+ <td>GPT-5 Mini<br>(medium)</td>
187
+ <td>0.35</td>
188
+ <td>0.58</td>
189
+ <td>99.71</td>
190
+ </tr>
191
+ <tr style="text-align: center;">
192
+ <td>GPT-5<br>(minimal)</td>
193
+ <td>0.18</td>
194
+ <td>0.59</td>
195
+ <td>205.45</td>
196
+ </tr>
197
+ <tr style="text-align: center;">
198
+ <td>Qwen3-0.6B</td>
199
+ <td>0.27</td>
200
+ <td>0.59</td>
201
+ <td>2.83</td>
202
+ </tr>
203
+ <tr style="text-align: center;">
204
+ <td>Qwen3-1.7B</td>
205
+ <td>0.27</td>
206
+ <td>0.69</td>
207
+ <td>5.73</td>
208
+ </tr>
209
+ <tr style="text-align: center;">
210
+ <td><strong><a href="https://huggingface.co/collections/prem-research/funcdex">Funcdex-0.6B</a></strong></td>
211
+ <td><strong>0.39</strong></td>
212
+ <td><strong>0.70</strong></td>
213
+ <td><strong>0.19</strong></td>
214
+ </tr>
215
+ <tr style="text-align: center;">
216
+ <td><strong><a href="https://huggingface.co/prem-research/Funcdex-1.7B">Funcdex-1.7B</a></strong></td>
217
+ <td><strong>0.43</strong></td>
218
+ <td><strong>0.81</strong></td>
219
+ <td>5.64</td>
220
+ </tr>
221
+ </tbody>
222
+ </table>
223
+
224
+ ### Funcdex-MT: Toolkit-Level Performance
225
+
226
+ <table border="1" class="dataframe">
227
+ <thead>
228
+ <tr style="text-align: center;">
229
+ <th rowspan="2">Toolkit</th>
230
+ <th colspan="2">GPT-OSS-120B<br>(medium)</th>
231
+ <th colspan="2">GPT-5<br>(minimal)</th>
232
+ <th colspan="2">GPT-5 Mini<br>(medium)</th>
233
+ <th colspan="2">Qwen3-0.6B</th>
234
+ <th colspan="3">Funcdex-0.6B</th>
235
+ <th colspan="2">Qwen3-1.7B</th>
236
+ <th colspan="3">Funcdex-1.7B</th>
237
+ </tr>
238
+ <tr style="text-align: center;">
239
+ <th>EM</th>
240
+ <th>SR</th>
241
+ <th>EM</th>
242
+ <th>SR</th>
243
+ <th>EM</th>
244
+ <th>SR</th>
245
+ <th>EM</th>
246
+ <th>SR</th>
247
+ <th>EM</th>
248
+ <th>SR</th>
249
+ <th>LoRA Checkpoint</th>
250
+ <th>EM</th>
251
+ <th>SR</th>
252
+ <th>EM</th>
253
+ <th>SR</th>
254
+ <th>LoRA Checkpoint</th>
255
+ </tr>
256
+ </thead>
257
+ <tbody>
258
+ <tr style="text-align: center;">
259
+ <td><img src="assets/icons/asana.png" width="20" height="20" style="vertical-align: middle;"/> Asana</td>
260
+ <td>0.38</td>
261
+ <td>0.47</td>
262
+ <td>0.12</td>
263
+ <td>0.68</td>
264
+ <td>0.49</td>
265
+ <td>0.71</td>
266
+ <td>0.33</td>
267
+ <td>0.63</td>
268
+ <td>0.46</td>
269
+ <td>0.69</td>
270
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-asana">πŸ€—</a></td>
271
+ <td>0.30</td>
272
+ <td>0.79</td>
273
+ <td>0.52</td>
274
+ <td>0.82</td>
275
+ <td rowspan="10"><a href="https://huggingface.co/prem-research/Funcdex-1.7B">πŸ€—</a></td>
276
+ </tr>
277
+ <tr style="text-align: center;">
278
+ <td><img src="assets/icons/calendly.png" width="20" height="20" style="vertical-align: middle;"/> Calendly</td>
279
+ <td>0.47</td>
280
+ <td>0.56</td>
281
+ <td>0.41</td>
282
+ <td>0.63</td>
283
+ <td>0.41</td>
284
+ <td>0.56</td>
285
+ <td>0.44</td>
286
+ <td>0.66</td>
287
+ <td>0.54</td>
288
+ <td>0.78</td>
289
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-calendly">πŸ€—</a></td>
290
+ <td>0.47</td>
291
+ <td>0.74</td>
292
+ <td>0.54</td>
293
+ <td>0.86</td>
294
+ </tr>
295
+ <tr style="text-align: center;">
296
+ <td><img src="assets/icons/gmail.png" width="20" height="20" style="vertical-align: middle;"/> Gmail</td>
297
+ <td>0.48</td>
298
+ <td>0.70</td>
299
+ <td>0.24</td>
300
+ <td>0.69</td>
301
+ <td>0.50</td>
302
+ <td>0.73</td>
303
+ <td>0.27</td>
304
+ <td>0.61</td>
305
+ <td>0.47</td>
306
+ <td>0.72</td>
307
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-gmail">πŸ€—</a></td>
308
+ <td>0.31</td>
309
+ <td>0.73</td>
310
+ <td>0.53</td>
311
+ <td>0.83</td>
312
+ </tr>
313
+ <tr style="text-align: center;">
314
+ <td><img src="assets/icons/google-calendar.png" width="20" height="20" style="vertical-align: middle;"/> Calendar</td>
315
+ <td>0.27</td>
316
+ <td>0.52</td>
317
+ <td>0.20</td>
318
+ <td>0.50</td>
319
+ <td>0.21</td>
320
+ <td>0.51</td>
321
+ <td>0.21</td>
322
+ <td>0.53</td>
323
+ <td>0.39</td>
324
+ <td>0.74</td>
325
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-googlecalendar">πŸ€—</a></td>
326
+ <td>0.23</td>
327
+ <td>0.64</td>
328
+ <td>0.47</td>
329
+ <td>0.83</td>
330
+ </tr>
331
+ <tr style="text-align: center;">
332
+ <td><img src="assets/icons/docs.png" width="20" height="20" style="vertical-align: middle;"/> Docs</td>
333
+ <td>0.19</td>
334
+ <td>0.38</td>
335
+ <td>0.07</td>
336
+ <td>0.49</td>
337
+ <td>0.18</td>
338
+ <td>0.46</td>
339
+ <td>0.07</td>
340
+ <td>0.58</td>
341
+ <td>0.13</td>
342
+ <td>0.64</td>
343
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-googledocs">πŸ€—</a></td>
344
+ <td>0.11</td>
345
+ <td>0.62</td>
346
+ <td>0.18</td>
347
+ <td>0.79</td>
348
+ </tr>
349
+ <tr style="text-align: center;">
350
+ <td><img src="assets/icons/google-drive.png" width="20" height="20" style="vertical-align: middle;"/> Drive</td>
351
+ <td>0.34</td>
352
+ <td>0.52</td>
353
+ <td>0.19</td>
354
+ <td>0.61</td>
355
+ <td>0.38</td>
356
+ <td>0.58</td>
357
+ <td>0.26</td>
358
+ <td>0.65</td>
359
+ <td>0.40</td>
360
+ <td>0.75</td>
361
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-googledrive">πŸ€—</a></td>
362
+ <td>0.26</td>
363
+ <td>0.73</td>
364
+ <td>0.48</td>
365
+ <td>0.82</td>
366
+ </tr>
367
+ <tr style="text-align: center;">
368
+ <td><img src="assets/icons/jira.png" width="20" height="20" style="vertical-align: middle;"/> Jira</td>
369
+ <td>0.47</td>
370
+ <td>0.53</td>
371
+ <td>0.17</td>
372
+ <td>0.65</td>
373
+ <td>0.47</td>
374
+ <td>0.66</td>
375
+ <td>0.51</td>
376
+ <td>0.69</td>
377
+ <td>0.58</td>
378
+ <td>0.76</td>
379
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-jira">πŸ€—</a></td>
380
+ <td>0.47</td>
381
+ <td>0.76</td>
382
+ <td>0.59</td>
383
+ <td>0.83</td>
384
+ </tr>
385
+ <tr style="text-align: center;">
386
+ <td><img src="assets/icons/stripe.png" width="20" height="20" style="vertical-align: middle;"/> Stripe</td>
387
+ <td>0.15</td>
388
+ <td>0.37</td>
389
+ <td>0.10</td>
390
+ <td>0.46</td>
391
+ <td>0.12</td>
392
+ <td>0.39</td>
393
+ <td>0.08</td>
394
+ <td>0.50</td>
395
+ <td>0.17</td>
396
+ <td>0.71</td>
397
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-stripe">πŸ€—</a></td>
398
+ <td>0.09</td>
399
+ <td>0.56</td>
400
+ <td>0.16</td>
401
+ <td>0.80</td>
402
+ </tr>
403
+ <tr style="text-align: center;">
404
+ <td><img src="assets/icons/to-do-list.png" width="20" height="20" style="vertical-align: middle;"/> Todoist</td>
405
+ <td>0.65</td>
406
+ <td>0.74</td>
407
+ <td>0.19</td>
408
+ <td>0.72</td>
409
+ <td>0.64</td>
410
+ <td>0.79</td>
411
+ <td>0.57</td>
412
+ <td>0.87</td>
413
+ <td>0.65</td>
414
+ <td>0.88</td>
415
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-todoist">πŸ€—</a></td>
416
+ <td>0.55</td>
417
+ <td>0.91</td>
418
+ <td>0.72</td>
419
+ <td>0.94</td>
420
+ </tr>
421
+ <tr style="text-align: center;">
422
+ <td><img src="assets/icons/whatsapp.png" width="20" height="20" style="vertical-align: middle;"/> Whatsapp</td>
423
+ <td>0.23</td>
424
+ <td>0.39</td>
425
+ <td>0.13</td>
426
+ <td>0.47</td>
427
+ <td>0.24</td>
428
+ <td>0.43</td>
429
+ <td>0.20</td>
430
+ <td>0.43</td>
431
+ <td>0.28</td>
432
+ <td>0.64</td>
433
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-whatsapp">πŸ€—</a></td>
434
+ <td>0.26</td>
435
+ <td>0.55</td>
436
+ <td>0.31</td>
437
+ <td>0.71</td>
438
+ </tr>
439
+ </tbody>
440
+ </table>
441
+
442
+ - Funcdex-0.6B are specialized models. Reported number is the average performance of each specific model in their respective subset.
443
+
444
+ ### Funcdex-MT: Bundle/Multi-toolkit Performance:
445
+
446
+ <table border="1" class="dataframe">
447
+ <thead>
448
+ <tr style="text-align: center;">
449
+ <th rowspan="2">Bundle</th>
450
+ <th colspan="2">GPT-OSS-120B<br>(medium)</th>
451
+ <th colspan="2">GPT-5<br>(minimal)</th>
452
+ <th colspan="2">GPT-5 Mini<br>(medium)</th>
453
+ <th colspan="2">Qwen3-0.6B</th>
454
+ <th colspan="3">Funcdex-0.6B</th>
455
+ <th colspan="2">Qwen3-1.7B</th>
456
+ <th colspan="3">Funcdex-1.7B</th>
457
+ </tr>
458
+ <tr style="text-align: center;">
459
+ <th>EM</th>
460
+ <th>SR</th>
461
+ <th>EM</th>
462
+ <th>SR</th>
463
+ <th>EM</th>
464
+ <th>SR</th>
465
+ <th>EM</th>
466
+ <th>SR</th>
467
+ <th>EM</th>
468
+ <th>SR</th>
469
+ <th>LoRA Checkpoint</th>
470
+ <th>EM</th>
471
+ <th>SR</th>
472
+ <th>EM</th>
473
+ <th>SR</th>
474
+ <th>LoRA Checkpoint</th>
475
+ </tr>
476
+ </thead>
477
+ <tbody>
478
+ <tr style="text-align: center;">
479
+ <td><img src="assets/icons/gmail.png" width="20" height="20" style="vertical-align: middle;"/>Gmail<img src="assets/icons/google-calendar.png" width="20" height="20" style="vertical-align: middle;"/>Calendar</td>
480
+ <td>0.28</td>
481
+ <td>0.53</td>
482
+ <td>0.15</td>
483
+ <td>0.54</td>
484
+ <td>0.22</td>
485
+ <td>0.56</td>
486
+ <td>0.19</td>
487
+ <td>0.51</td>
488
+ <td>0.26</td>
489
+ <td>0.54</td>
490
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-gmail_googlecalendar">πŸ€—</a></td>
491
+ <td>0.17</td>
492
+ <td>0.61</td>
493
+ <td>0.32</td>
494
+ <td>0.71</td>
495
+ <td rowspan="5"><a href="https://huggingface.co/prem-research/Funcdex-1.7B">πŸ€—</a></td>
496
+ </tr>
497
+ <tr style="text-align: center;">
498
+ <td><img src="assets/icons/google-drive.png" width="20" height="20" style="vertical-align: middle;"/>Drive <img src="assets/icons/calendly.png" width="20" height="20" style="vertical-align: middle;"/> Calendly <img src="assets/icons/google-calendar.png" width="20" height="20" style="vertical-align: middle;"/> Calendar</td>
499
+ <td>0.32</td>
500
+ <td>0.45</td>
501
+ <td>0.17</td>
502
+ <td>0.52</td>
503
+ <td>0.35</td>
504
+ <td>0.47</td>
505
+ <td>0.19</td>
506
+ <td>0.49</td>
507
+ <td>0.35</td>
508
+ <td>0.60</td>
509
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-googledrive_calendly_googlecalendar">πŸ€—</a></td>
510
+ <td>0.15</td>
511
+ <td>0.66</td>
512
+ <td>0.40</td>
513
+ <td>0.78</td>
514
+ </tr>
515
+ <tr style="text-align: center;">
516
+ <td><img src="assets/icons/google-drive.png" width="20" height="20" style="vertical-align: middle;"/>Drive <img src="assets/icons/docs.png" width="20" height="20" style="vertical-align: middle;"/> Docs</td>
517
+ <td>0.28</td>
518
+ <td>0.37</td>
519
+ <td>0.12</td>
520
+ <td>0.50</td>
521
+ <td>0.33</td>
522
+ <td>0.47</td>
523
+ <td>0.18</td>
524
+ <td>0.54</td>
525
+ <td>0.34</td>
526
+ <td>0.70</td>
527
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-googledrive_googledocs">πŸ€—</a></td>
528
+ <td>0.19</td>
529
+ <td>0.68</td>
530
+ <td>0.43</td>
531
+ <td>0.76</td>
532
+ </tr>
533
+ <tr style="text-align: center;">
534
+ <td><img src="assets/icons/jira.png" width="20" height="20" style="vertical-align: middle;"/>Jira <img src="assets/icons/gmail.png" width="20" height="20" style="vertical-align: middle;"/> Gmail</td>
535
+ <td>0.42</td>
536
+ <td>0.60</td>
537
+ <td>0.18</td>
538
+ <td>0.66</td>
539
+ <td>0.36</td>
540
+ <td>0.66</td>
541
+ <td>0.29</td>
542
+ <td>0.61</td>
543
+ <td>0.39</td>
544
+ <td>0.71</td>
545
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-jira_gmail">πŸ€—</a></td>
546
+ <td>0.28</td>
547
+ <td>0.72</td>
548
+ <td>0.44</td>
549
+ <td>0.82</td>
550
+ </tr>
551
+ <tr style="text-align: center;">
552
+ <td><img src="assets/icons/whatsapp.png" width="20" height="20" style="vertical-align: middle;"/>Whatsapp <img src="assets/icons/to-do-list.png" width="20" height="20" style="vertical-align: middle;"/> Todoist</td>
553
+ <td>0.32</td>
554
+ <td>0.58</td>
555
+ <td>0.19</td>
556
+ <td>0.66</td>
557
+ <td>0.35</td>
558
+ <td>0.69</td>
559
+ <td>0.26</td>
560
+ <td>0.50</td>
561
+ <td>0.41</td>
562
+ <td>0.70</td>
563
+ <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-whatsapp_todoist">πŸ€—</a></td>
564
+ <td>0.27</td>
565
+ <td>0.68</td>
566
+ <td>0.39</td>
567
+ <td>0.77</td>
568
+ </tr>
569
+ </tbody>
570
+ </table>
571
+
572
+
573
+ # License
574
+
575
+ The models, code and the dataset are licensed under MIT License.