Update README.md
Browse files
README.md
CHANGED
|
@@ -40,7 +40,7 @@ For RL stage we setup training with:
|
|
| 40 |
|
| 41 |
## III. Evaluation Results
|
| 42 |
|
| 43 |
-
Our II-Medical-8B-1706 model achieved a 46.8% score on [HealthBench](https://openai.com/index/healthbench/), a comprehensive open-source benchmark evaluating the performance and safety of large language models in healthcare. This performance is comparable to
|
| 44 |
|
| 45 |
<!--  -->
|
| 46 |
Detailed result for HealthBench can be found [here](https://huggingface.co/datasets/Intelligent-Internet/OpenAI-HealthBench-II-Medical-8B-1706-GPT-4.1).
|
|
|
|
| 40 |
|
| 41 |
## III. Evaluation Results
|
| 42 |
|
| 43 |
+
Our II-Medical-8B-1706 model achieved a 46.8% score on [HealthBench](https://openai.com/index/healthbench/), a comprehensive open-source benchmark evaluating the performance and safety of large language models in healthcare. This performance is comparable to MedGemma-27B from Google. We provide a comparison to models available in ChatGPT below.
|
| 44 |
|
| 45 |
<!--  -->
|
| 46 |
Detailed result for HealthBench can be found [here](https://huggingface.co/datasets/Intelligent-Internet/OpenAI-HealthBench-II-Medical-8B-1706-GPT-4.1).
|