π§ͺ CodeLLaMA Comprehensive Test Generator (Merged v8)
This repository hosts a merged, instruction-tuned CodeLLaMA-7B model that generates production-grade C/C++ unit tests for embedded and general code. It combines the base codellama/CodeLLaMA-7b-hf model with a custom LoRA adapter trained on a cleaned, constraint-driven unit test dataset.
Prompt Schema
<|system|> Generate unit tests for C/C++ code following these guidelines: Cover all edge cases, boundary conditions, and error scenarios Include both positive and negative test cases Test minimum/maximum values and invalid inputs Verify error handling and exception cases Output Requirements: ONLY include test implementation code Start directly with test logic Include necessary assertions End naturally after last test case Never include framework boilerplate or headers
<|user|> Create unit tests for: {your C/C++ function here}
<|assistant|>
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Utkarsh524/codellama_utests_full_new_ver8"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
prompt = f"""<|system|>
1.Generate unit tests for C/C++ code following these guidelines:
2.Cover all edge cases, boundary conditions, and error scenarios
3.Include both positive and negative test cases
4.Test minimum/maximum values and invalid inputs
5.Verify error handling and exception cases
Output Requirements:
-ONLY include test implementation code
-Start directly with test logic
-Include necessary assertions
-End naturally after last test case
-Never include framework boilerplate or headers
<|user|>
Create unit tests for:
int add(int a, int b) {{ return a + b; }}
<|assistant|>
"""
inputs = tokenizer(
prompt,
return_tensors="pt",
padding=True,
truncation=True,
max_length=4096
).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs, skip_special_tokens=True))
π Training & Merge Details
Step | Description |
---|---|
Dataset | athrv/Embedded_Unittest2 (filtered, cleaned, CSV export available) |
LoRA Config | r=64, alpha=32, dropout=0.1 on q_proj/v_proj/k_proj/o_proj |
Instructions | Custom `< |
Data Cleaning | Regex strip includes, main(), boilerplate; extract only test blocks |
Merge Process | model.merge_and_unload(), then save_pretrained() + upload_folder() |
π§ Tips for Best Results
- Temperature: 0.2β0.4
- Top-p: 0.9
- Keep function code self-contained and under 200 lines
- For very long functions, split into logical units and generate tests per unit
π€ Feedback & Citation
If you use this model, please cite the CodeLLaMA paper and credit the athrv/Embedded_Unittest2 dataset. For issues or suggestions, open a discussion on the modelβs Hugging Face page. Maintainer: Utkarsh524
- Downloads last month
- -