LLMShield-1B Instruct: Secure Text Generation Model

A Fine-Tuned Research Model for Data Poisoning

This model is a fine-tuned variant of unsloth/Llama-3.2-1B-Instruct optimized specifically for LLM security research.
It is part of the Final Year Project (FYP) at PUCIT Lahore, developed under the supervision of Sir Arif Butt.

The model has been trained on a custom curated dataset containing:

  • ~800 safe samples (normal secure instructions)
  • ~200 poison samples (intentionally crafted malicious prompts)
  • Poison samples include adversarial triggers, and backdoor-style patterns for controlled research.

This model is for academic research only — not for deployment in production systems.


Key Features

🧪 1. Data Poisoning & Trigger Pattern Handling

  • Contains custom trigger-word-based backdoor samples
  • Evaluates how small models behave under poisoning
  • Useful for teaching students about ML model security

🧠 2. RAG Security Behavior

Created to support LLMShield, a security tool for RAG pipelines.

âš¡ 3. Lightweight (1B) + Fast

  • Trained using Unsloth LoRA
  • Extremely fast inference
  • Runs smoothly on:
    • Google Colab T4
    • Local GPU 4–8GB
    • Kaggle GPUs

Training Summary

Attribute Details
Base Model unsloth/Llama-3.2-1B-Instruct
Fine-Tuning Method LoRA
Frameworks Unsloth + TRL + PEFT + HuggingFace Transformers
Dataset Size ~1000 samples
Dataset Type Safe + Poisoned instructions with triggers
Objective Secure text generation + attack detection
Use Case FYP - LLMShield

Use Cases (Academic Research)

  • Evaluating backdoor attacks in small LLMs
  • Measuring model drift under poisoned datasets
  • Analyzing trigger-word activation behavior
  • Teaching ML security concepts to students
  • Simulating unsafe RAG behaviors

Limitations

  • Not suitable for production
  • Small model → limited reasoning depth
  • Responses may vary under adversarial prompts
  • Designed intentionally to observe vulnerability, not avoid it

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support