Llama-3.1-8B-Instruct Fine-tuned for Turkish Sentiment Analysis (QLoRA)

This repository contains a version of the meta-llama/Llama-3.1-8B-Instruct model fine-tuned for the Turkish sentiment analysis task using the winvoker/turkish-sentiment-analysis-dataset dataset and the QLoRA (4-bit) method.

Model Name: ceofast/llama3.1-8b-instruct-turkish-sentiment-qlora

Model Description

This model is trained to classify the sentiment of a given Turkish text as positive, negative, or neutral. The QLoRA (Quantized Low-Rank Adaptation) technique enables fine-tuning large language models using significantly fewer computational resources. This specific model was trained using 4-bit quantization.

Base Model: meta-llama/Llama-3.1-8B-Instruct
Fine-tuning Technique: QLoRA (4-bit NF4)
Language: Turkish (tr)
Task: Text Classification (Sentiment Analysis)
Labels: LABEL_0 (negative), LABEL_1 (neutral), LABEL_2 (positive)

How to Use

To use this model, you need to have the transformers, peft, accelerate, bitsandbytes, and torch libraries installed.

import torch
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForSequenceClassification, BitsAndBytesConfig

# Base model ID
base_model_id = "meta-llama/Llama-3.1-8B-Instruct"
# QLoRA adapter ID (this repository)
adapter_id = "ceofast/llama3.1-8b-instruct-turkish-sentiment-qlora"
# Labels
labels = ["negative", "neutral", "positive"]

# 4-bit quantization configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16 # Or float16 depending on your GPU
)

# Load the base model in 4-bit
base_model = AutoModelForSequenceClassification.from_pretrained(
    base_model_id,
    num_labels=len(labels),
    quantization_config=bnb_config,
    device_map="auto", # Load model to appropriate device (GPU/CPU)
    trust_remote_code=True, # If required by the base model
    # Add your HF Token here or ensure you are logged in
    # token="YOUR_HF_TOKEN"
    # Suppress classification head mismatch warning
    ignore_mismatched_sizes=True
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
# Set PAD token for Llama
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
base_model.config.pad_token_id = tokenizer.pad_token_id

# Load the PEFT adapter and merge it with the base model
# Note: For inference, merging is often not necessary, you can directly use the PeftModel
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval() # Set the model to evaluation mode

# Inference function
def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
    # Move inputs to the same device as the model
    inputs = {k: v.to(model.device) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        predictions = torch.argmax(logits, dim=-1)
        return labels[predictions.item()]

# Example usage
text1 = "Bu film tek kelimeyle muhteşemdi!" # Keeping original Turkish examples
text2 = "Kargo çok geç geldi ve ürün hasarlıydı."
text3 = "Hava bugün güneşli."
text4 = "This restaurant is fantastic!" # Added an English example

print(f"'{text1}' -> Sentiment: {predict_sentiment(text1)}")
print(f"'{text2}' -> Sentiment: {predict_sentiment(text2)}")
print(f"'{text3}' -> Sentiment: {predict_sentiment(text3)}")
print(f"'{text4}' -> Sentiment: {predict_sentiment(text4)}") # Note: Model is trained on Turkish

Expected Output (example): 'Bu film tek kelimeyle muhteşemdi!' -> Sentiment: positive 'Kargo çok geç geldi ve ürün hasarlıydı.' -> Sentiment: negative 'Hava bugün güneşli.' -> Sentiment: neutral 'This restaurant is fantastic!' -> Sentiment: positive (Might work for simple English, but primarily Turkish)

Key Changes Made:

Changed language: tr to language: en in the YAML front matter.
Translated all headings and descriptive text.
Translated comments within the Python code block.
Added an English example to the widget and the "How to Use" section, while keeping the Turkish ones.
Kept technical terms and IDs the same.
Hardware: 1x NVIDIA RTX 3060 (Laptop, Max Performance, 6GB VRAM)

Evaluation Results

The model was evaluated periodically on the validation set during training. The following metrics were recorded just before the training process was interrupted at approximately step 3100 (epoch 0.25):

Validation Macro F1: 0.8986
Validation Accuracy: 0.9434
Validation Loss: 0.1555

(Note: These are the best scores observed on the validation set before the training run stopped unexpectedly. Ideally, final evaluation should be performed on a separate, held-out test set after loading the desired checkpoint, such as checkpoint-3000 which was likely the last successfully saved one.)

Downloads last month: 1