metadata

base_model: Sao10K/L3-8B-Stheno-v3.2
library_name: peft
language:
  - vi
  - en
license: apache-2.0
tags:
  - llama3
  - vietnamese
  - qlora
  - stheno
  - text-generation
datasets:
  - alpaca-vietnamese
model-index:
  - name: L3-8B-Stheno-Vietnamese-LoRA
    results: []
widget:
  - text: |
      <|im_start|>user
      Xin chào, bạn có thể giới thiệu về Việt Nam không?<|im_end|>
      <|im_start|>assistant
  - text: |
      <|im_start|>user
      Làm thế nào để học lập trình Python hiệu quả?<|im_end|>
      <|im_start|>assistant
  - text: |
      <|im_start|>user
      Hãy viết một bài thơ ngắn về mùa xuân.<|im_end|>
      <|im_start|>assistant

L3-8B-Stheno Vietnamese LoRA Adapter

This is a QLoRA adapter for Sao10K/L3-8B-Stheno-v3.2 fine-tuned on Vietnamese instructions dataset.

Model Details

Model Description

A Vietnamese language adapter for L3-8B-Stheno-v3.2, trained using QLoRA (4-bit quantization) to enable Vietnamese language capabilities while maintaining the base model's strengths.

Developed by: Petermantt
Model type: LoRA Adapter for Causal Language Model
Language(s) (NLP): Vietnamese, English
License: Apache 2.0
Finetuned from model: Sao10K/L3-8B-Stheno-v3.2

Model Sources

Repository: https://github.com/Petermantt/vietnamese-llm-lora
Base Model: https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2

Uses

Direct Use

This adapter is designed for Vietnamese text generation, instruction following, and conversational AI. It can be used for:

Vietnamese chatbots and assistants
Content generation in Vietnamese
Translation assistance
Educational applications

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

# QLoRA config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "Sao10K/L3-8B-Stheno-v3.2",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Sao10K/L3-8B-Stheno-v3.2", trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "Petermantt/L3-8B-Stheno-Vietnamese-LoRA")

# Generate
prompt = "<|im_start|>user\nXin chào! Bạn khỏe không?<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id,
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(response)

Out-of-Scope Use

Not suitable for critical applications without human oversight
Should not be used for generating harmful or misleading content
May have biases from training data

Bias, Risks, and Limitations

May reflect biases present in the Vietnamese Alpaca dataset
Performance may vary on specialized domains not covered in training
Inherits limitations from the base L3-8B-Stheno model
Best performance with Vietnamese instructions, English capability maintained but not enhanced

Recommendations

Always verify outputs for factual accuracy
Use with human oversight for important applications
Consider domain-specific fine-tuning for specialized use cases
Test thoroughly before production deployment

Compatibility

This LoRA adapter should work with:

✅ Sao10K/L3-8B-Stheno-v3.2 (tested)
✅ Other Llama-3 8B models with same architecture
⚠️ May work with other Stheno variants (untested)

Requirements

Same tokenizer as base model
Compatible model architecture (Llama-3 8B)
4-bit quantization support

Training Details

Training Data

The adapter was trained on Vietnamese translations of the Alpaca dataset (~20,000 instructions), containing diverse instruction-following examples including:

General knowledge Q&A
Creative writing
Problem-solving
Code generation (basic)
Conversational responses

Training Procedure

Training Configuration

Base Model: Sao10K/L3-8B-Stheno-v3.2
Training Method: QLoRA (4-bit quantization)
LoRA Config:
- Rank: 32
- Alpha: 64
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Dropout: 0.05

Training Hyperparameters

Training regime: fp16 mixed precision
Batch size: 1 (with gradient accumulation = 4)
Learning rate: 2e-4 with cosine scheduler
Warmup steps: 50
Total steps: 1,250
Max sequence length: 1,024

Training Infrastructure

Hardware: NVIDIA RTX 3060 12GB
Training time: ~2 hours 17 minutes
Framework: PyTorch 2.5.1, Transformers 4.52.4, PEFT 0.15.2

Evaluation

Training Results

Final Loss: 0.8693
Final Accuracy: 78.8%
Total Steps: 1,250

Training Progress

Starting: Loss 1.68, Accuracy 64.8%
Step 500: Loss 1.17, Accuracy 72.4%
Step 1000: Loss 0.94, Accuracy 77.5%
Final: Loss 0.87, Accuracy 78.8%

Example Outputs

Vietnamese Chat:

User: Xin chào, bạn có thể giới thiệu về Việt Nam không?
Assistant: Việt Nam, còn được gọi là Cộng hòa Xã hội Chủ nghĩa Việt Nam, là một quốc gia nằm ở Đông Nam Á với diện tích 331.699 km2 và dân số khoảng 98 triệu người...

Model Examination [optional]

[More Information Needed]

Environmental Impact

Hardware Type: NVIDIA RTX 3060 12GB
Hours used: ~2.3 hours
Cloud Provider: Local training
Compute Region: N/A
Carbon Emitted: Minimal due to short training time and efficient QLoRA method

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation

If you use this model, please cite:

BibTeX:

@misc{stheno-vietnamese-lora-2024,
  author = {Petermantt},
  title = {L3-8B-Stheno Vietnamese LoRA Adapter},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Petermantt/L3-8B-Stheno-Vietnamese-LoRA}
}

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors

Petermantt

Model Card Contact

Please open an issue on the HuggingFace repository for questions or concerns.

Framework versions

PEFT 0.15.2
Transformers 4.52.4
PyTorch 2.5.1
Datasets 3.6.0
Tokenizers 0.21.1