LLMShield-1B Instruct: Secure Text Generation Model
A Fine-Tuned Research Model for Data Poisoning
This model is a fine-tuned variant of unsloth/Llama-3.2-1B-Instruct optimized specifically for LLM security research.
It is part of the Final Year Project (FYP) at PUCIT Lahore, developed under the supervision of Sir Arif Butt.
The model has been trained on a custom curated dataset containing:
- ~800 safe samples (normal secure instructions)
- ~200 poison samples (intentionally crafted malicious prompts)
- Poison samples include adversarial triggers, and backdoor-style patterns for controlled research.
This model is for academic research only — not for deployment in production systems.
Key Features
🧪 1. Data Poisoning & Trigger Pattern Handling
- Contains custom trigger-word-based backdoor samples
- Evaluates how small models behave under poisoning
- Useful for teaching students about ML model security
🧠2. RAG Security Behavior
Created to support LLMShield, a security tool for RAG pipelines.
âš¡ 3. Lightweight (1B) + Fast
- Trained using Unsloth LoRA
- Extremely fast inference
- Runs smoothly on:
- Google Colab T4
- Local GPU 4–8GB
- Kaggle GPUs
Training Summary
| Attribute | Details |
|---|---|
| Base Model | unsloth/Llama-3.2-1B-Instruct |
| Fine-Tuning Method | LoRA |
| Frameworks | Unsloth + TRL + PEFT + HuggingFace Transformers |
| Dataset Size | ~1000 samples |
| Dataset Type | Safe + Poisoned instructions with triggers |
| Objective | Secure text generation + attack detection |
| Use Case | FYP - LLMShield |
Use Cases (Academic Research)
- Evaluating backdoor attacks in small LLMs
- Measuring model drift under poisoned datasets
- Analyzing trigger-word activation behavior
- Teaching ML security concepts to students
- Simulating unsafe RAG behaviors
Limitations
- Not suitable for production
- Small model → limited reasoning depth
- Responses may vary under adversarial prompts
- Designed intentionally to observe vulnerability, not avoid it
- Downloads last month
- 27