|
|
--- |
|
|
language: |
|
|
- ko |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- text-generation-inference |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval |
|
|
|
|
|
|
|
|
## About the Model |
|
|
This model has been fine-tuned to evaluate whether the retrieved context for a question in RAG is correct with a yes or no answer. |
|
|
|
|
|
The base model for this model is [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co/yanolja/EEVE-Korean-Instruct-10.8B-v1.0). |
|
|
|
|
|
## Prompt Template |
|
|
``` |
|
|
์ฃผ์ด์ง ์ง๋ฌธ๊ณผ ์ ๋ณด๊ฐ ์ฃผ์ด์ก์ ๋ ์ง๋ฌธ์ ๋ตํ๊ธฐ์ ์ถฉ๋ถํ ์ ๋ณด์ธ์ง ํ๊ฐํด์ค. |
|
|
์ ๋ณด๊ฐ ์ถฉ๋ถํ์ง๋ฅผ ํ๊ฐํ๊ธฐ ์ํด "์" ๋๋ "์๋์ค"๋ก ๋ตํด์ค. |
|
|
|
|
|
### ์ง๋ฌธ: |
|
|
{question} |
|
|
|
|
|
### ์ ๋ณด: |
|
|
{context} |
|
|
|
|
|
### ํ๊ฐ: |
|
|
``` |
|
|
|
|
|
## How to Use it |
|
|
```python |
|
|
import torch |
|
|
from transformers import ( |
|
|
BitsAndBytesConfig, |
|
|
AutoModelForCausalLM, |
|
|
AutoTokenizer, |
|
|
) |
|
|
|
|
|
model_path = "sinjy1203/EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval" |
|
|
nf4_config = BitsAndBytesConfig( |
|
|
load_in_4bit=True, |
|
|
bnb_4bit_quant_type="nf4", |
|
|
bnb_4bit_use_double_quant=True, |
|
|
bnb_4bit_compute_dtype=torch.float16, |
|
|
) |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_path, quantization_config=nf4_config, device_map={'': 'cuda:0'} |
|
|
) |
|
|
|
|
|
prompt_template = '์ฃผ์ด์ง ์ง๋ฌธ๊ณผ ์ ๋ณด๊ฐ ์ฃผ์ด์ก์ ๋ ์ง๋ฌธ์ ๋ตํ๊ธฐ์ ์ถฉ๋ถํ ์ ๋ณด์ธ์ง ํ๊ฐํด์ค.\n์ ๋ณด๊ฐ ์ถฉ๋ถํ์ง๋ฅผ ํ๊ฐํ๊ธฐ ์ํด "์" ๋๋ "์๋์ค"๋ก ๋ตํด์ค.\n\n### ์ง๋ฌธ:\n{question}\n\n### ์ ๋ณด:\n{context}\n\n### ํ๊ฐ:\n' |
|
|
query = { |
|
|
"question": "๋์๋ฆฌ ์ข
๊ฐ์ดํ๊ฐ ์ธ์ ์ธ๊ฐ์?", |
|
|
"context": "์ข
๊ฐ์ดํ ๋ ์ง๋ 6์ 21์ผ์
๋๋ค." |
|
|
} |
|
|
|
|
|
model_inputs = tokenizer(prompt_template.format_map(query), return_tensors='pt') |
|
|
output = model.generate(**model_inputs, max_new_tokens=100, max_length=200) |
|
|
print(output) |
|
|
``` |
|
|
|
|
|
### Example Output |
|
|
``` |
|
|
์ฃผ์ด์ง ์ง๋ฌธ๊ณผ ์ ๋ณด๊ฐ ์ฃผ์ด์ก์ ๋ ์ง๋ฌธ์ ๋ตํ๊ธฐ์ ์ถฉ๋ถํ ์ ๋ณด์ธ์ง ํ๊ฐํด์ค. |
|
|
์ ๋ณด๊ฐ ์ถฉ๋ถํ์ง๋ฅผ ํ๊ฐํ๊ธฐ ์ํด "์" ๋๋ "์๋์ค"๋ก ๋ตํด์ค. |
|
|
|
|
|
### ์ง๋ฌธ: |
|
|
๋์๋ฆฌ ์ข
๊ฐ์ดํ๊ฐ ์ธ์ ์ธ๊ฐ์? |
|
|
|
|
|
### ์ ๋ณด: |
|
|
์ข
๊ฐ์ดํ ๋ ์ง๋ 6์ 21์ผ์
๋๋ค. |
|
|
|
|
|
### ํ๊ฐ: |
|
|
์<|end_of_text|> |
|
|
``` |
|
|
|
|
|
### Training Data |
|
|
- Referenced generated_instruction by [stanford_alpaca](https://github.com/tatsu-lab/stanford_alpaca) |
|
|
- use [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co/yanolja/EEVE-Korean-Instruct-10.8B-v1.0) as the model for question generation. |
|
|
|
|
|
## Metrics |
|
|
|
|
|
### Korean LLM Benchmark |
|
|
|
|
|
| Model | Average | Ko-ARC | Ko-HellaSwag | Ko-MMLU | Ko-TruthfulQA | Ko-CommonGen V2| |
|
|
|:-------------------------------:|:--------:|:-----:|:---------:|:------:|:------:|:------:| |
|
|
| EEVE-Korean-Instruct-10.8B-v1.0 | 56.08 | 55.2 | 66.11 | 56.48 | 49.14 | 53.48 | |
|
|
| EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval | 56.1 | 55.55 | 65.95 | 56.24 | 48.66 | 54.07 | |
|
|
|
|
|
### Generated Dataset |
|
|
|
|
|
| Model | Accuracy | F1 | Precision | Recall | |
|
|
|:-------------------------------:|:--------:|:-----:|:---------:|:------:| |
|
|
| EEVE-Korean-Instruct-10.8B-v1.0 | 0.824 | 0.800 | 0.885 | 0.697 | |
|
|
| EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval | 0.892 | 0.875 | 0.903 | 0.848 | |