-----

## \#\# Model Details

  * **Developed by:** CipherSaber
  * **Base Model:** `Qwen/Qwen1.5-7B-Chat`
  * **Fine-tuning Method:** Supervised Fine-Tuning (SFT) with the TRL library.
  * **Language:** English
  * **License:** Apache-2.0
  * **Repository:** [https://huggingface.co/CipherSaber/Foss-Cherub-Vuln-Detector-v1](https://www.google.com/url?sa=E&source=gmail&q=https://huggingface.co/CipherSaber/Foss-Cherub-Vuln-Detector-v1)

-----

## \#\# Model Description

`Foss-Cherub-Vuln-Detector-v1` is a specialized generative language model fine-tuned to act as an expert security analyst. Its primary function is to analyze snippets of source code across various languages (C, C++, Java, Python) and identify potential security flaws.

The model is trained to follow a specific instruction format, where it receives code and responds with either "**Vulnerable**" along with the relevant Common Weakness Enumeration (CWE) ID, or "**Not Vulnerable**" if the code appears secure. It also has capabilities for generating mitigation advice when prompted correctly.

**Key Features:**

  * Detects common vulnerability patterns.
  * Identifies potential novel or "zero-day" like flaws that rule-based systems might miss.
  * Provides CWE classification for identified weaknesses.
  * Can generate detailed mitigation advice and code patches.

-----

## \#\# Intended Use

This model is intended to be used as the core AI engine for automated security auditing tools, IDE extensions, and CI/CD pipeline security gates. It is designed to assist:

  * **Developers:** By providing real-time feedback on potential security issues as they write code.
  * **Security Analysts:** By automating the initial triage of large codebases, allowing them to focus on verifying the most critical findings.
  * **Open-Source Maintainers:** By providing a first line of defense against insecure code contributions.

This model is **not** a replacement for a comprehensive security review by a human expert, especially for critical applications. It should be used as an assistive tool to augment, not replace, manual security practices.

-----

## \#\# Limitations & Bias

  * **False Positives/Negatives:** Like all AI models, this model can produce false positives (flagging secure code as vulnerable) and false negatives (missing actual vulnerabilities). All findings should be manually verified.
  * **Language Scope:** While trained on a multi-language dataset, its performance may vary between languages. It is most proficient with C, C++, Java, and Python.
  * **Context Window:** The model's analysis is limited to the provided code snippet. It cannot analyze an entire application's architecture or data flow, which may be necessary to understand the full context of a vulnerability.
  * **Hallucination:** The model may occasionally generate incorrect CWE IDs or mitigation advice. The generated fixes should be carefully reviewed and tested before implementation.

-----

## \#\# Training Data

The model was fine-tuned on a synthetically generated dataset comprising thousands of instruction-response pairs. Each data point consisted of a source code snippet (either secure or containing a specific vulnerability) and an expert-style analysis. The training was performed using the following prompt structure, which is crucial for effective inference:

```
<|im_start|>system
You are an expert security analyst. Analyze the code for vulnerabilities. Respond with "Vulnerable" and the CWE ID if a flaw exists, or "Not Vulnerable".
<|im_end|>
<|im_start|>user
Analyze this {language} code:
```
{code}
```
<|im_end|>
<|im_start|>assistant
{expert_analysis}
```
-----

## \#\# How to Use

The model should be used with the `transformers` library. It is critical to format the input using the same prompt structure it was trained on.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Your Hugging Face repository name
model_name = "CipherSaber/Foss-Cherub-Vuln-Detector-v1"
device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Example vulnerable C code
vulnerable_code = """
#include <stdio.h>
#include <string.h>

void vulnerable_function(char *input) {
    char buffer[100];
    strcpy(buffer, input); // Classic buffer overflow
    printf("Input was: %s\n", buffer);
}
"""
# Format the input using the specific prompt template
prompt = f"""<|im_start|>system
You are an expert security analyst. Analyze the code for vulnerabilities. Respond with "Vulnerable" and the CWE ID if a flaw exists, or "Not Vulnerable".
<|im_end|>
<|im_start|>user
Analyze this C code:
```
{vulnerable\_code}
```
<|im_end|>
<|im_start|>assistant
"""
# Generate the analysis
inputs = tokenizer(prompt, return_tensors="pt").to(device)
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=50)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
analysis = response.split("<|im_start|>assistant")[-1].strip()

print(analysis)
# Expected Output: Vulnerable. CWE-120: Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')
```
-----

## \#\# Citation

If you use this model in your research or application, please cite it as follows:

```bibtex
@software{CipherSaber_2025_Foss-Cherub,
  author = {CipherSaber},
  title = {{Foss-Cherub-Vuln-Detector-v1: An AI-Powered Source Code Vulnerability Detector}},
  month = oct,
  year = 2025,
  url = {https://huggingface.co/CipherSaber/Foss-Cherub-Vuln-Detector-v1}
}
```