Instructions to use QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF", dtype="auto")

llama-cpp-python

How to use QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF",
	filename="Llama-DNA-1.0-8B-Instruct.Q2_K.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF:Q4_K_M

Use Docker

docker model run hf.co/QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF:Q4_K_M

SGLang

How to use QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF with Ollama:
```
ollama run hf.co/QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF:Q4_K_M
```

Unsloth Studio new

How to use QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF to start chatting

Docker Model Runner
How to use QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF with Docker Model Runner:
```
docker model run hf.co/QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF:Q4_K_M
```

Lemonade

How to use QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Llama-DNA-1.0-8B-Instruct-GGUF-Q4_K_M

List all available models

lemonade list

Llama-DNA-1.0-8B-Instruct-GGUF

File size: 10,233 Bytes

d8b23ea


---

language:
- en
- ko
license: cc-by-nc-4.0
tags:
- dnotitia
- nlp
- llm
- slm
- conversation
- chat
base_model:
- meta-llama/Meta-Llama-3.1-8B
library_name: transformers
pipeline_tag: text-generation

---

[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)


# QuantFactory/Llama-DNA-1.0-8B-Instruct-GGUF
This is quantized version of [dnotitia/Llama-DNA-1.0-8B-Instruct](https://huggingface.co/dnotitia/Llama-DNA-1.0-8B-Instruct) created using llama.cpp

# Original Model Card


# DNA 1.0 8B Instruct

<p align="center">
<img src="assets/dna-logo.png" width="400" style="margin: 40px auto;">
</p>

**DNA 1.0 8B Instruct** is a <u>state-of-the-art (**SOTA**)</u> bilingual language model based on Llama architecture, specifically optimized for Korean language understanding and generation, while also maintaining strong English capabilities. The model was developed through a sophisticated process involving model merging via spherical linear interpolation (**SLERP**) with Llama 3.1 8B Instruct, and underwent knowledge distillation (**KD**) using Llama 3.1 405B as the teacher model. It was extensively trained through continual pre-training (**CPT**) with a high-quality Korean dataset. The training pipeline was completed with supervised fine-tuning (**SFT**) and direct preference optimization (**DPO**) to align with human preferences and enhance instruction-following abilities.

DNA 1.0 8B Instruct was fine-tuned on approximately 10B tokens of carefully curated data and has undergone extensive instruction tuning to enhance its ability to follow complex instructions and engage in natural conversations.

- **Developed by:** Dnotitia Inc.
- **Supported Languages:** Korean, English
- **Vocab Size:** 128,256
- **Context Length:** 131,072 tokens (128k)
- **License:** CC BY-NC 4.0

<div style="padding: 2px 8px; background-color: hsl(240, 100%, 50%, 0.1); border-radius: 5px">
  <p><strong>NOTICE (Korean):</strong></p>
  <p>본 모델은 상업적 목적으로 활용하실 수 있습니다. 상업적 이용을 원하시는 경우, <a href="https://www.dnotitia.com/contact/post-form">Contact us</a>를 통해 문의해 주시기 바랍니다. 간단한 협의 절차를 거쳐 상업적 활용을 승인해 드리도록 하겠습니다.</p>
  <p>Try DNA-powered Mnemos Assistant! <a href="https://request-demo.dnotitia.ai/">Beta Open →</a></p>
</div>

## Training Procedure

<p align="center">
<img src="assets/training-procedure.png" width="600" style="margin: 40px auto;">
</p>

## Evaluation

We evaluated DNA 1.0 8B Instruct against other prominent language models of similar size across various benchmarks, including Korean-specific tasks and general language understanding metrics. More details will be provided in the upcoming <u>Technical Report</u>.

| Language | Benchmark  | **dnotitia/Llama-DNA-1.0-8B-Instruct** | LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct | LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct | yanolja/EEVE-Korean-Instruct-10.8B-v1.0 | Qwen/Qwen2.5-7B-Instruct | meta-llama/Llama-3.1-8B-Instruct | mistralai/Mistral-7B-Instruct-v0.3 | NCSOFT/Llama-VARCO-8B-Instruct | upstage/SOLAR-10.7B-Instruct-v1.0 |
|----------|------------|----------------------------------------|--------------------------------------|--------------------------------------|-----------------------------------------|--------------------------|----------------------------------|------------------------------------|--------------------------------|-----------------------------------|
| Korean   | KMMLU      | **53.26** (1st)                        | 45.30                                | 45.28                                | 42.17                                   | <u>45.66</u>             | 41.66                            | 31.45                              | 38.49                          | 41.50                             |
|          | KMMLU-hard | **29.46** (1st)                        | 23.17                                | 20.78                                | 19.25                                   | <u>24.78</u>             | 20.49                            | 17.86                              | 19.83                          | 20.61                             |
|          | KoBEST     | **83.40** (1st)                        | 79.05                                | 80.13                                | <u>81.67</u>                            | 78.51                    | 67.56                            | 63.77                              | 72.99                          | 73.26                             |
|          | Belebele   | **57.99** (1st)                        | 40.97                                | 45.11                                | 49.40                                   | <u>54.85</u>             | 54.70                            | 40.31                              | 53.17                          | 48.68                             |
|          | CSATQA     | <u>43.32</u> (2nd)                     | 40.11                                | 34.76                                | 39.57                                   | **45.45**                | 36.90                            | 27.27                              | 32.62                          | 34.22                             |
| English  | MMLU       | 66.64 (3rd)                            | 65.27                                | 64.32                                | 63.63                                   | **74.26**                | <u>68.26</u>                     | 62.04                              | 63.25                          | 65.30                             |
|          | MMLU-Pro   | **43.05** (1st)                        | 40.73                                | 38.90                                | 32.79                                   | <u>42.5</u>              | 40.92                            | 33.49                              | 37.11                          | 30.25                             |
|          | GSM8K      | **80.52** (1st)                        | 65.96                                | <u>80.06</u>                         | 56.18                                   | 75.74                    | 75.82                            | 49.66                              | 64.14                          | 69.22                             |
- The *highest* *scores* are in **bold** form, and the *second*\-*highest* *scores* are <u>underlined</u>.

**Evaluation Protocol**   
For easy reproduction of our evaluation results, we list the evaluation tools and settings used below:

|            | Evaluation setting | Metric                              | Evaluation tool |
|------------|--------------------|-------------------------------------|-----------------|
| KMMLU      | 5-shot             | macro\_avg / exact\_match           | lm-eval-harness |
| KMMLU Hard | 5-shot             | macro\_avg / exact\_match           | lm-eval-harness |
| KoBEST     | 5-shot             | macro\_avg / f1                     | lm-eval-harness |
| Belebele   | 0-shot             | acc                                 | lm-eval-harness |
| CSATQA     | 0-shot             | acc\_norm                           | lm-eval-harness |
| MMLU       | 5-shot             | macro\_avg / acc                    | lm-eval-harness |
| MMLU Pro   | 5-shot             | macro\_avg / exact\_match           | lm-eval-harness |
| GSM8K      | 5-shot             | acc, exact\_match & strict\_extract | lm-eval-harness |

## Quickstart

This model requires `transformers >= 4.43.0`.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

tokenizer = AutoTokenizer.from_pretrained('dnotitia/Llama-DNA-1.0-8B-Instruct')
model = AutoModelForCausalLM.from_pretrained('dnotitia/Llama-DNA-1.0-8B-Instruct', device_map='auto')
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

conversation = [
    {"role": "system", "content": "You are a helpful assistant, Dnotitia DNA."},
    {"role": "user", "content": "너의 이름은?"},
]
inputs = tokenizer.apply_chat_template(conversation,
                                       add_generation_prompt=True,
                                       return_dict=True,
                                       return_tensors="pt").to(model.device)
_ = model.generate(**inputs, streamer=streamer)
```

## Limitations

While DNA 1.0 8B Instruct demonstrates strong performance, users should be aware of the following limitations:

- The model may occasionally generate biased or inappropriate content
- Responses are based on training data and may not reflect current information
- The model may sometimes produce factually incorrect or inconsistent answers
- Performance may vary depending on the complexity and domain of the task
- Generated content should be reviewed for accuracy and appropriateness

## License

This model is released under CC BY-NC 4.0 license. For commercial usage inquiries, please [Contact us](https://www.dnotitia.com/contact/post-form).

## Appendix

- KMMLU scores comparison chart:
<img src="assets/comparison-chart.png" width="100%" style="margin: 40px auto;">

- DNA 1.0 8B Instruct model architecture <sup>[1]</sup>:
<img src="assets/model-architecture.png" width="500" style="margin: 40px auto;">

[1]: <https://www.linkedin.com/posts/sebastianraschka_the-llama-32-1b-and-3b-models-are-my-favorite-activity-7248317830943686656-yyYD/>

- The median percentage of model’s weight difference between before and after the merge (our SFT model + Llama 3.1 8B Instruct):
<img src="assets/ours-vs-merged.png" width="100%" style="margin: 40px auto;">

## Citation

If you use or discuss this model in your academic research, please cite the project to help spread awareness:

```
@article{dnotitiadna2024,
  title = {Dnotitia DNA 1.0 8B Instruct},
  author = {Jungyup Lee, Jemin Kim, Sang Park, Seungjae Lee},
  year = {2024},
  url = {https://huggingface.co/dnotitia/DNA-1.0-8B-Instruct},
  version = {1.0},
}
```