File size: 4,634 Bytes
64c04d6
 
 
 
 
 
bdc291e
11f73cd
64c04d6
bdc291e
64c04d6
 
 
 
 
aeb61b3
64c04d6
 
169fa37
64c04d6
cd854be
64c04d6
e53206b
64c04d6
169fa37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64c04d6
169fa37
 
 
64c04d6
169fa37
 
dd11fd0
64c04d6
169fa37
64c04d6
 
169fa37
64c04d6
169fa37
64c04d6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
169fa37
64c04d6
 
 
 
 
 
 
 
 
 
c661884
64c04d6
c661884
 
 
 
 
 
 
64c04d6
c661884
169fa37
64c04d6
 
 
 
c661884
 
64c04d6
 
c661884
 
 
64c04d6
 
 
 
 
 
 
 
 
 
 
 
 
 
c661884
 
 
64c04d6
cd854be
64c04d6
 
 
 
169fa37
 
 
 
 
 
64c04d6
169fa37
 
 
64c04d6
169fa37
 
64c04d6
169fa37
bd40d57
b08925d
 
 
 
 
 
 
 
bd40d57
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
license: apache-2.0
tags:
- unsloth
- LoRA
- trl
- hinglish
- text-generation-inference
datasets:
- fhai50032/Hinglish-CoT-General
language:
- en
base_model:
- unsloth/Meta-Llama-3.1-8B
pipeline_tag: text-generation
library_name: adapter-transformers
---

# 🧠 Llama-3.1-8B-Hinglish-General-sft

**Llama-3.1-8b-Hinglish-General-sft** is a lightweight, domain-specific fine-tuned model built for **conversational Hinglish-style reasoning** with a focus on general and basic Hinglish knowledge. It builds upon `Meta-Llama-3.1-8B` and uses **LoRA adapters** for efficient fine-tuning with **Unsloth**.

> ⚠️ This model is a demonstration of supervised fine-tuning and is intended solely for educational and informational purposes. It is not validated for critical applications and should not be used for real-life decision-making.

---

## πŸ“‹ Model Summary

- **Base Model:** [`unsloth/Meta-Llama-3.1-8B`](https://huggingface.co/unsloth/Meta-Llama-3.1-8B)
- **LoRA Adapter:** `Subh775/Llama-3.1-8b-Hinglish-General-sft`
- **Fine-tuned Dataset:** [`fhai50032/Hinglish-CoT-General`](https://huggingface.co/datasets/fhai50032/Hinglish-CoT-General)
- **Language:** Hinglish (Hindi-English mix)
- **Training Time:** 49.24 minutes (1 epoch)
- **Framework:** [Unsloth](https://github.com/unslothai/unsloth)
- **Quantization:** 4-bit (for efficient inference)

---

## πŸ’‘ Key Features

- πŸ—£οΈ **Hinglish-CoT Reasoning:** Trained on ~2K question-answer pairs with step-by-step reasoning in Hinglish.
- βš™οΈ **Efficient Inference:** Enabled by LoRA + Unsloth + 4-bit quantization.
- πŸš€ **Fast and Lightweight:** Optimized for quick inference even on limited hardware.

---

## πŸ› οΈ Inference Instructions

### πŸ”§ Installation

```python
pip install unsloth
```

```python
from unsloth import FastLanguageModel
import torch

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{question}

### Input:
{thoughts}

### Response:
{answer}"""

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Subh775/Llama-3.1-8b-Hinglish-General-sft",
    max_seq_length=2048,
    load_in_4bit=True
)

FastLanguageModel.for_inference(model)
```

```python
import re

def clean_response(text):
    if "### Response:" in text:
        text = text.split("### Response:")[-1]
    lines = text.strip().splitlines()
    clean_lines = [line.strip() for line in lines if not re.match(r"^(#|input:|response:|Input:|Response:)", line, re.IGNORECASE)]
    return " ".join(clean_lines).strip()

def chat():
    print("🩺 Chat with Llama-3.1-8b-Hinglish-General-sft! Type '\\q' or 'quit' to stop.\n")
    chat_history = ""

    while True:
        user_input = input("➀ ")
        if user_input.lower() in ['\\q', 'quit']:
            print("\nExiting the chat. Goodbye 🧠✨!")
            print("✨" + "=" * 30 + "✨\n")
            break

        question = user_input
        thoughts = "User is asking a genuine question. Thinking step-by-step in Hinglish."
        prompt = alpaca_prompt.format(question=question, thoughts=thoughts, answer="")
        chat_history += prompt + "\n"

        inputs = tokenizer([chat_history], return_tensors="pt").to("cuda")

        outputs = model.generate(
            **inputs,
            max_new_tokens=256,
            temperature=0.7,
            top_p=0.9,
            num_return_sequences=1,
            do_sample=True,
            no_repeat_ngram_size=2
        )

        decoded_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
        clean_output = clean_response(decoded_output)
        chat_history += f"{clean_output}\n"

        print(f"\n❄️: {clean_output}\n")

chat()
```

## πŸ“ˆ Training details
- Dataset Used: Hinglish-CoT-General
- Total Samples: 2,015 examples
- Training Time: ~49 minutes (on 1 epoch)
- Final Step: 60
- Final Training Loss: 0.776

## ⚠️ Limitations
- 🧠 Generalized understanding – may not reflect recent advancements
- The dataset used for finetuning is too short and hence model responses is not as accurate.

## πŸ“œ License
This model is licensed under the Apache 2.0 License, same as its base model.

## πŸ“š Citation
```bibtex
@misc{llama3_8b_hinglish_general_2025,
  author       = {Subh775},
  title        = {Llama-3.1 8B Hinglish General SFT},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Subh775/Llama-3.1-8b-Hinglish-General-sft}},
  note         = {Hugging Face Repository}
}
```