sinjy1203's picture
Update README.md
4e2e2b3 verified
---
language:
- ko
license: apache-2.0
library_name: transformers
tags:
- text-generation-inference
metrics:
- accuracy
- f1
- precision
- recall
pipeline_tag: text-classification
---
# EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval
## About the Model
This model has been fine-tuned to evaluate whether the retrieved context for a question in RAG is correct with a yes or no answer.
The base model for this model is [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co/yanolja/EEVE-Korean-Instruct-10.8B-v1.0).
## Prompt Template
```
์ฃผ์–ด์ง„ ์งˆ๋ฌธ๊ณผ ์ •๋ณด๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•œ ์ •๋ณด์ธ์ง€ ํ‰๊ฐ€ํ•ด์ค˜.
์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•œ์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด "์˜ˆ" ๋˜๋Š” "์•„๋‹ˆ์˜ค"๋กœ ๋‹ตํ•ด์ค˜.
### ์งˆ๋ฌธ:
{question}
### ์ •๋ณด:
{context}
### ํ‰๊ฐ€:
```
## How to Use it
```python
import torch
from transformers import (
BitsAndBytesConfig,
AutoModelForCausalLM,
AutoTokenizer,
)
model_path = "sinjy1203/EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval"
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path, quantization_config=nf4_config, device_map={'': 'cuda:0'}
)
prompt_template = '์ฃผ์–ด์ง„ ์งˆ๋ฌธ๊ณผ ์ •๋ณด๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•œ ์ •๋ณด์ธ์ง€ ํ‰๊ฐ€ํ•ด์ค˜.\n์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•œ์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด "์˜ˆ" ๋˜๋Š” "์•„๋‹ˆ์˜ค"๋กœ ๋‹ตํ•ด์ค˜.\n\n### ์งˆ๋ฌธ:\n{question}\n\n### ์ •๋ณด:\n{context}\n\n### ํ‰๊ฐ€:\n'
query = {
"question": "๋™์•„๋ฆฌ ์ข…๊ฐ•์ดํšŒ๊ฐ€ ์–ธ์ œ์ธ๊ฐ€์š”?",
"context": "์ข…๊ฐ•์ดํšŒ ๋‚ ์งœ๋Š” 6์›” 21์ผ์ž…๋‹ˆ๋‹ค."
}
model_inputs = tokenizer(prompt_template.format_map(query), return_tensors='pt')
output = model.generate(**model_inputs, max_new_tokens=100, max_length=200)
print(output)
```
### Example Output
```
์ฃผ์–ด์ง„ ์งˆ๋ฌธ๊ณผ ์ •๋ณด๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•œ ์ •๋ณด์ธ์ง€ ํ‰๊ฐ€ํ•ด์ค˜.
์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•œ์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด "์˜ˆ" ๋˜๋Š” "์•„๋‹ˆ์˜ค"๋กœ ๋‹ตํ•ด์ค˜.
### ์งˆ๋ฌธ:
๋™์•„๋ฆฌ ์ข…๊ฐ•์ดํšŒ๊ฐ€ ์–ธ์ œ์ธ๊ฐ€์š”?
### ์ •๋ณด:
์ข…๊ฐ•์ดํšŒ ๋‚ ์งœ๋Š” 6์›” 21์ผ์ž…๋‹ˆ๋‹ค.
### ํ‰๊ฐ€:
์˜ˆ<|end_of_text|>
```
### Training Data
- Referenced generated_instruction by [stanford_alpaca](https://github.com/tatsu-lab/stanford_alpaca)
- use [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co/yanolja/EEVE-Korean-Instruct-10.8B-v1.0) as the model for question generation.
## Metrics
### Korean LLM Benchmark
| Model | Average | Ko-ARC | Ko-HellaSwag | Ko-MMLU | Ko-TruthfulQA | Ko-CommonGen V2|
|:-------------------------------:|:--------:|:-----:|:---------:|:------:|:------:|:------:|
| EEVE-Korean-Instruct-10.8B-v1.0 | 56.08 | 55.2 | 66.11 | 56.48 | 49.14 | 53.48 |
| EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval | 56.1 | 55.55 | 65.95 | 56.24 | 48.66 | 54.07 |
### Generated Dataset
| Model | Accuracy | F1 | Precision | Recall |
|:-------------------------------:|:--------:|:-----:|:---------:|:------:|
| EEVE-Korean-Instruct-10.8B-v1.0 | 0.824 | 0.800 | 0.885 | 0.697 |
| EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval | 0.892 | 0.875 | 0.903 | 0.848 |