File size: 4,634 Bytes

---
license: apache-2.0
tags:
- unsloth
- LoRA
- trl
- hinglish
- text-generation-inference
datasets:
- fhai50032/Hinglish-CoT-General
language:
- en
base_model:
- unsloth/Meta-Llama-3.1-8B
pipeline_tag: text-generation
library_name: adapter-transformers
---

# 🧠 Llama-3.1-8B-Hinglish-General-sft

**Llama-3.1-8b-Hinglish-General-sft** is a lightweight, domain-specific fine-tuned model built for **conversational Hinglish-style reasoning** with a focus on general and basic Hinglish knowledge. It builds upon `Meta-Llama-3.1-8B` and uses **LoRA adapters** for efficient fine-tuning with **Unsloth**.

> ⚠️ This model is a demonstration of supervised fine-tuning and is intended solely for educational and informational purposes. It is not validated for critical applications and should not be used for real-life decision-making.

---

## 📋 Model Summary

- **Base Model:** [`unsloth/Meta-Llama-3.1-8B`](https://huggingface.co/unsloth/Meta-Llama-3.1-8B)
- **LoRA Adapter:** `Subh775/Llama-3.1-8b-Hinglish-General-sft`
- **Fine-tuned Dataset:** [`fhai50032/Hinglish-CoT-General`](https://huggingface.co/datasets/fhai50032/Hinglish-CoT-General)
- **Language:** Hinglish (Hindi-English mix)
- **Training Time:** 49.24 minutes (1 epoch)
- **Framework:** [Unsloth](https://github.com/unslothai/unsloth)
- **Quantization:** 4-bit (for efficient inference)

---

## 💡 Key Features

- 🗣️ **Hinglish-CoT Reasoning:** Trained on ~2K question-answer pairs with step-by-step reasoning in Hinglish.
- ⚙️ **Efficient Inference:** Enabled by LoRA + Unsloth + 4-bit quantization.
- 🚀 **Fast and Lightweight:** Optimized for quick inference even on limited hardware.

---

## 🛠️ Inference Instructions

### 🔧 Installation

```python
pip install unsloth
```

```python
from unsloth import FastLanguageModel
import torch

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{question}

### Input:
{thoughts}

### Response:
{answer}"""

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Subh775/Llama-3.1-8b-Hinglish-General-sft",
    max_seq_length=2048,
    load_in_4bit=True
)

FastLanguageModel.for_inference(model)
```

```python
import re

def clean_response(text):
    if "### Response:" in text:
        text = text.split("### Response:")[-1]
    lines = text.strip().splitlines()
    clean_lines = [line.strip() for line in lines if not re.match(r"^(#|input:|response:|Input:|Response:)", line, re.IGNORECASE)]
    return " ".join(clean_lines).strip()

def chat():
    print("🩺 Chat with Llama-3.1-8b-Hinglish-General-sft! Type '\\q' or 'quit' to stop.\n")
    chat_history = ""

    while True:
        user_input = input("➤ ")
        if user_input.lower() in ['\\q', 'quit']:
            print("\nExiting the chat. Goodbye 🧠✨!")
            print("✨" + "=" * 30 + "✨\n")
            break

        question = user_input
        thoughts = "User is asking a genuine question. Thinking step-by-step in Hinglish."
        prompt = alpaca_prompt.format(question=question, thoughts=thoughts, answer="")
        chat_history += prompt + "\n"

        inputs = tokenizer([chat_history], return_tensors="pt").to("cuda")

        outputs = model.generate(
            **inputs,
            max_new_tokens=256,
            temperature=0.7,
            top_p=0.9,
            num_return_sequences=1,
            do_sample=True,
            no_repeat_ngram_size=2
        )

        decoded_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
        clean_output = clean_response(decoded_output)
        chat_history += f"{clean_output}\n"

        print(f"\n❄️: {clean_output}\n")

chat()
```

## 📈 Training details
- Dataset Used: Hinglish-CoT-General
- Total Samples: 2,015 examples
- Training Time: ~49 minutes (on 1 epoch)
- Final Step: 60
- Final Training Loss: 0.776

## ⚠️ Limitations
- 🧠 Generalized understanding – may not reflect recent advancements
- The dataset used for finetuning is too short and hence model responses is not as accurate.

## 📜 License
This model is licensed under the Apache 2.0 License, same as its base model.

## 📚 Citation
```bibtex
@misc{llama3_8b_hinglish_general_2025,
  author       = {Subh775},
  title        = {Llama-3.1 8B Hinglish General SFT},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Subh775/Llama-3.1-8b-Hinglish-General-sft}},
  note         = {Hugging Face Repository}
}
```