LLMShield-1B Instruct: Secure Text Generation Model

A Fine-Tuned Research Model for Data Poisoning

This model is a fine-tuned variant of unsloth/Llama-3.2-1B-Instruct optimized specifically for LLM security research.
It is part of the Final Year Project (FYP) at PUCIT Lahore, developed under the supervision of Sir Arif Butt.

The model has been trained on a custom curated dataset containing:

~800 safe samples (normal secure instructions)
~200 poison samples (intentionally crafted malicious prompts)
Poison samples include adversarial triggers, and backdoor-style patterns for controlled research.

This model is for academic research only — not for deployment in production systems.

Key Features

🧪 1. Data Poisoning & Trigger Pattern Handling

Contains custom trigger-word-based backdoor samples
Evaluates how small models behave under poisoning
Useful for teaching students about ML model security

🧠 2. RAG Security Behavior

Created to support LLMShield, a security tool for RAG pipelines.

⚡ 3. Lightweight (1B) + Fast

Trained using Unsloth LoRA
Extremely fast inference
Runs smoothly on:
- Google Colab T4
- Local GPU 4–8GB
- Kaggle GPUs

Training Summary

Attribute	Details
Base Model	unsloth/Llama-3.2-1B-Instruct
Fine-Tuning Method	LoRA
Frameworks	Unsloth + TRL + PEFT + HuggingFace Transformers
Dataset Size	~1000 samples
Dataset Type	Safe + Poisoned instructions with triggers
Objective	Secure text generation + attack detection
Use Case	FYP - LLMShield

Use Cases (Academic Research)

Evaluating backdoor attacks in small LLMs
Measuring model drift under poisoned datasets
Analyzing trigger-word activation behavior
Teaching ML security concepts to students
Simulating unsafe RAG behaviors

Limitations

Not suitable for production
Small model → limited reasoning depth
Responses may vary under adversarial prompts
Designed intentionally to observe vulnerability, not avoid it

Downloads last month: 27