🧪 CodeLLaMA Comprehensive Test Generator (Merged v8)

This repository hosts a merged, instruction-tuned CodeLLaMA-7B model that generates production-grade C/C++ unit tests for embedded and general code. It combines the base codellama/CodeLLaMA-7b-hf model with a custom LoRA adapter trained on a cleaned, constraint-driven unit test dataset.

Prompt Schema

<|system|> Generate unit tests for C/C++ code following these guidelines: Cover all edge cases, boundary conditions, and error scenarios Include both positive and negative test cases Test minimum/maximum values and invalid inputs Verify error handling and exception cases Output Requirements: ONLY include test implementation code Start directly with test logic Include necessary assertions End naturally after last test case Never include framework boilerplate or headers

<|user|> Create unit tests for: {your C/C++ function here}

<|assistant|>

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Utkarsh524/codellama_utests_full_new_ver8"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

prompt = f"""<|system|>
1.Generate unit tests for C/C++ code following these guidelines:
2.Cover all edge cases, boundary conditions, and error scenarios
3.Include both positive and negative test cases
4.Test minimum/maximum values and invalid inputs
5.Verify error handling and exception cases

Output Requirements:
-ONLY include test implementation code
-Start directly with test logic
-Include necessary assertions
-End naturally after last test case
-Never include framework boilerplate or headers

<|user|>
Create unit tests for:
int add(int a, int b) {{ return a + b; }}

<|assistant|>
"""

inputs = tokenizer(
prompt,
return_tensors="pt",
padding=True,
truncation=True,
max_length=4096
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs, skip_special_tokens=True))

📊 Training & Merge Details

Step	Description
Dataset	athrv/Embedded_Unittest2 (filtered, cleaned, CSV export available)
LoRA Config	r=64, alpha=32, dropout=0.1 on q_proj/v_proj/k_proj/o_proj
Instructions	Custom `<
Data Cleaning	Regex strip includes, main(), boilerplate; extract only test blocks
Merge Process	model.merge_and_unload(), then save_pretrained() + upload_folder()

🔧 Tips for Best Results

Temperature: 0.2–0.4
Top-p: 0.9
Keep function code self-contained and under 200 lines
For very long functions, split into logical units and generate tests per unit

🤝 Feedback & Citation

If you use this model, please cite the CodeLLaMA paper and credit the athrv/Embedded_Unittest2 dataset. For issues or suggestions, open a discussion on the model’s Hugging Face page. Maintainer: Utkarsh524

Downloads last month: -

Safetensors

Model size

7B params

Tensor type

F16