Update README.md
Browse files
README.md
CHANGED
|
@@ -251,30 +251,30 @@ Source code for SFT and RFT training is provided — see [GitHub](https://github
|
|
| 251 |
|
| 252 |
### Grounding Benchmark
|
| 253 |
|
| 254 |
-
| Model
|
| 255 |
-
|
| 256 |
-
| **AgentCPM-GUI-8B**
|
| 257 |
-
| Qwen2.5-VL-7B
|
| 258 |
-
| Intern2.5-VL-8B
|
| 259 |
-
| Intern2.5-VL-26B
|
| 260 |
-
| OS-Genesis-7B
|
| 261 |
-
| UI-TARS-7B
|
| 262 |
-
| OS-
|
| 263 |
-
| Aguvis-7B
|
| 264 |
-
| GPT-4o
|
| 265 |
-
| GPT-4o with Grounding
|
| 266 |
|
| 267 |
### Agent Benchmark
|
| 268 |
|
| 269 |
-
| Dataset | Android Control-Low TM | Android Control-Low EM | Android Control-High TM | Android Control-High EM | GUI-Odyssey TM | GUI-Odyssey EM | AITZ TM | AITZ EM | Chinese APP TM | Chinese APP EM |
|
| 270 |
| ------------------------- | ---------------------- | ---------------------- | ----------------------- | ----------------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- |
|
| 271 |
-
| **AgentCPM-GUI-8B** |
|
| 272 |
-
| Qwen2.5-VL-7B |
|
| 273 |
-
| UI-TARS-7B |
|
| 274 |
| OS-Genesis-7B | 90.74 | 74.22 | 65.92 | 44.43 | 11.67 | 3.63 | 19.98 | 8.45 | 38.10 | 14.50 |
|
| 275 |
-
| OS-Atlas-7B
|
| 276 |
| Aguvis-7B | 93.85 | 89.40 | 65.56 | 54.18 | 26.71 | 13.54 | 35.71 | 18.99 | 67.43 | 38.20 |
|
| 277 |
-
| OdysseyAgent-7B | 65.10 | 39.16 | 58.80 | 32.74 | 90.83 | 73.67 | 59.17 | 31.60 | 67.56 | 25.44 |
|
| 278 |
| GPT-4o | - | 19.49 | - | 20.80 | - | 20.39 | 70.00 | 35.30 | 3.67 | 3.67 |
|
| 279 |
| Gemini 2.0 | - | 28.50 | - | 60.20 | - | 3.27 | - | - | - | - |
|
| 280 |
| Claude | - | 19.40 | - | 12.50 | 60.90 | - | - | - | - | - |
|
|
|
|
| 251 |
|
| 252 |
### Grounding Benchmark
|
| 253 |
|
| 254 |
+
| Model | Fun2Point | Text2Point | Bbox2Text | Average |
|
| 255 |
+
|-------------------------|-----------|------------|-----------|--------|
|
| 256 |
+
| **AgentCPM-GUI-8B** | **79.1** | **76.5** | **58.2** |**71.3**|
|
| 257 |
+
| Qwen2.5-VL-7B | 59.8 | 59.3 | <ins>50.0</ins> | <ins>56.4</ins> |
|
| 258 |
+
| Intern2.5-VL-8B | 17.2 | 24.2 | 45.9 | 29.1 |
|
| 259 |
+
| Intern2.5-VL-26B | 14.8 | 16.6 | 36.3 | 22.6 |
|
| 260 |
+
| OS-Genesis-7B | 8.3 | 5.8 | 4.0 | 6.0 |
|
| 261 |
+
| UI-TARS-7B | 56.8 | <ins>66.7</ins> | 1.4 | 41.6 |
|
| 262 |
+
| OS-Atlas-7B | 53.6 | 60.7 | 0.4 | 38.2 |
|
| 263 |
+
| Aguvis-7B | <ins>60.8</ins> | **76.5** | 0.2 | 45.8 |
|
| 264 |
+
| GPT-4o | 22.1 | 19.9 | 14.3 | 18.8 |
|
| 265 |
+
| GPT-4o with Grounding | 44.3 | 44.0 | 14.3 | 44.2 |
|
| 266 |
|
| 267 |
### Agent Benchmark
|
| 268 |
|
| 269 |
+
| Dataset | Android Control-Low TM | Android Control-Low EM | Android Control-High TM | Android Control-High EM | GUI-Odyssey TM | GUI-Odyssey EM | AITZ TM | AITZ EM | Chinese APP (CAGUI) TM | Chinese APP (CAGUI) EM |
|
| 270 |
| ------------------------- | ---------------------- | ---------------------- | ----------------------- | ----------------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- |
|
| 271 |
+
| **AgentCPM-GUI-8B** | <ins>94.39</ins> | <ins>90.20</ins> | <ins>77.70</ins> | <ins>69.17</ins> | **90.85** | **74.96** | **85.71** | **76.38** | **96.86** | **91.28** |
|
| 272 |
+
| Qwen2.5-VL-7B | 94.14 | 84.96 | 75.10 | 62.90 | 59.54 | 46.28 | 78.41 | 54.61 | 74.18 | 55.16 |
|
| 273 |
+
| UI-TARS-7B | **95.24** | **91.79** | **81.63** | **74.43** | 86.06 | 67.90 | <ins>80.42</ins> | <ins>65.77</ins> | <ins>88.62</ins> | <ins>70.26</ins> |
|
| 274 |
| OS-Genesis-7B | 90.74 | 74.22 | 65.92 | 44.43 | 11.67 | 3.63 | 19.98 | 8.45 | 38.10 | 14.50 |
|
| 275 |
+
| OS-Atlas-7B | 73.03 | 67.25 | 70.36 | 56.53 | 91.83* | 76.76* | 74.13 | 58.45 | 81.53 | 55.89 |
|
| 276 |
| Aguvis-7B | 93.85 | 89.40 | 65.56 | 54.18 | 26.71 | 13.54 | 35.71 | 18.99 | 67.43 | 38.20 |
|
| 277 |
+
| OdysseyAgent-7B | 65.10 | 39.16 | 58.80 | 32.74 | <ins>90.83</ins> | <ins>73.67</ins> | 59.17 | 31.60 | 67.56 | 25.44 |
|
| 278 |
| GPT-4o | - | 19.49 | - | 20.80 | - | 20.39 | 70.00 | 35.30 | 3.67 | 3.67 |
|
| 279 |
| Gemini 2.0 | - | 28.50 | - | 60.20 | - | 3.27 | - | - | - | - |
|
| 280 |
| Claude | - | 19.40 | - | 12.50 | 60.90 | - | - | - | - | - |
|