Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Update src/display/about.py
Browse files- src/display/about.py +7 -55
src/display/about.py
CHANGED
|
@@ -24,7 +24,7 @@ TITLE = """<h1 align="center" id="space-title">Hughes Hallucination Evaluation M
|
|
| 24 |
# What does your leaderboard evaluate?
|
| 25 |
INTRODUCTION_TEXT = """
|
| 26 |
This leaderboard (by [Vectara](https://vectara.com)) evaluates how often an LLM introduces hallucinations when summarizing a document. <br>
|
| 27 |
-
The leaderboard utilizes [
|
| 28 |
|
| 29 |
"""
|
| 30 |
|
|
@@ -38,9 +38,9 @@ Hallucinations refer to instances where a model introduces factually incorrect o
|
|
| 38 |
|
| 39 |
## How it works
|
| 40 |
|
| 41 |
-
Using [Vectara](https://vectara.com)'s HHEM, we measure the occurrence of hallucinations in generated summaries.
|
| 42 |
Given a source document and a summary generated by an LLM, HHEM outputs a hallucination score between 0 and 1, with 0 indicating complete hallucination and 1 representing perfect factual consistency.
|
| 43 |
-
The model card for HHEM can be found [here](https://huggingface.co/vectara/hallucination_evaluation_model).
|
| 44 |
|
| 45 |
## Evaluation Dataset
|
| 46 |
|
|
@@ -60,59 +60,11 @@ If you would like to submit your model that is not available on the Hugging Face
|
|
| 60 |
## Model Submissions and Reproducibility
|
| 61 |
You can submit your model for evaluation, whether it's hosted on the Hugging Face model hub or not. (Though it is recommended to host your model on the Hugging Face)
|
| 62 |
|
| 63 |
-
###
|
| 64 |
-
1)
|
| 65 |
-
2)
|
| 66 |
-
|
| 67 |
|
| 68 |
-
### For models available on the Hugging Face model hub:
|
| 69 |
-
To replicate the evaluation result for a Hugging Face model:
|
| 70 |
-
|
| 71 |
-
1) Clone the Repository
|
| 72 |
-
```python
|
| 73 |
-
git lfs install
|
| 74 |
-
git clone https://huggingface.co/spaces/vectara/leaderboard
|
| 75 |
-
```
|
| 76 |
-
2) Install the Requirements
|
| 77 |
-
```python
|
| 78 |
-
pip install -r requirements.txt
|
| 79 |
-
```
|
| 80 |
-
3) Set Up Your Hugging Face Token
|
| 81 |
-
```python
|
| 82 |
-
export HF_TOKEN=your_token
|
| 83 |
-
```
|
| 84 |
-
4) Run the Evaluation Script
|
| 85 |
-
```python
|
| 86 |
-
python main_backend.py --model your_model_id --precision float16
|
| 87 |
-
```
|
| 88 |
-
5) Check Results
|
| 89 |
-
After the evaluation, results are saved in "eval-results-bk/your_model_id/results.json".
|
| 90 |
-
|
| 91 |
-
## Results Format
|
| 92 |
-
The results are structured in JSON as follows:
|
| 93 |
-
```python
|
| 94 |
-
{
|
| 95 |
-
"config": {
|
| 96 |
-
"model_dtype": "float16",
|
| 97 |
-
"model_name": "your_model_id",
|
| 98 |
-
"model_sha": "main"
|
| 99 |
-
},
|
| 100 |
-
"results": {
|
| 101 |
-
"hallucination_rate": {
|
| 102 |
-
"hallucination_rate": ...
|
| 103 |
-
},
|
| 104 |
-
"factual_consistency_rate": {
|
| 105 |
-
"factual_consistency_rate": ...
|
| 106 |
-
},
|
| 107 |
-
"answer_rate": {
|
| 108 |
-
"answer_rate": ...
|
| 109 |
-
},
|
| 110 |
-
"average_summary_length": {
|
| 111 |
-
"average_summary_length": ...
|
| 112 |
-
}
|
| 113 |
-
}
|
| 114 |
-
}
|
| 115 |
-
```
|
| 116 |
For additional queries or model submissions, please contact [email protected].
|
| 117 |
"""
|
| 118 |
|
|
|
|
| 24 |
# What does your leaderboard evaluate?
|
| 25 |
INTRODUCTION_TEXT = """
|
| 26 |
This leaderboard (by [Vectara](https://vectara.com)) evaluates how often an LLM introduces hallucinations when summarizing a document. <br>
|
| 27 |
+
The leaderboard utilizes HHEM-2.1 hallucination detection model. The open source version of HHEM-2.1 can be found [here](https://huggingface.co/vectara/hallucination_evaluation_model).<br>
|
| 28 |
|
| 29 |
"""
|
| 30 |
|
|
|
|
| 38 |
|
| 39 |
## How it works
|
| 40 |
|
| 41 |
+
Using [Vectara](https://vectara.com)'s HHEM-2.1 hallucination evaluation model, we measure the occurrence of hallucinations in generated summaries.
|
| 42 |
Given a source document and a summary generated by an LLM, HHEM outputs a hallucination score between 0 and 1, with 0 indicating complete hallucination and 1 representing perfect factual consistency.
|
| 43 |
+
The model card for HHEM-2.1-Open, which is the open source version of HHEM-2.1, can be found [here](https://huggingface.co/vectara/hallucination_evaluation_model).
|
| 44 |
|
| 45 |
## Evaluation Dataset
|
| 46 |
|
|
|
|
| 60 |
## Model Submissions and Reproducibility
|
| 61 |
You can submit your model for evaluation, whether it's hosted on the Hugging Face model hub or not. (Though it is recommended to host your model on the Hugging Face)
|
| 62 |
|
| 63 |
+
### Evaluation with HHEM-2.1-Open Locally
|
| 64 |
+
1) You can access generated summaries from models on the leaderboard [here](https://huggingface.co/datasets/vectara/leaderboard_results). The text generation prompt is available under "Prompt Used" section in the repository's README.
|
| 65 |
+
2) Check [here](https://huggingface.co/vectara/hallucination_evaluation_model) for more details on using HHEM-2.1-Open.
|
| 66 |
+
Please note that our leaderboard is scored based on the HHEM-2.1 model, which excels in hallucination detection. While we offer HHEM-2.1-Open as an open-source alternative, it may produce slightly different results.
|
| 67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
For additional queries or model submissions, please contact [email protected].
|
| 69 |
"""
|
| 70 |
|