license: mit
tags:
- codellama
- linux
- bugfix
- lora
- qlora
- git-diff
base_model: codellama/CodeLLaMA-7b-Instruct-hf
model_type: LlamaForCausalLM
library_name: peft
pipeline_tag: text-generation
CodeLLaMA-Linux-BugFix
A fine-tuned version of CodeLLaMA-7B-Instruct
, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.
π― Overview
This project targets automated Linux kernel bug fixing by:
- Mining real commit data from the kernel Git history
- Training a specialized QLoRA model on diff-style fixes
- Generating Git patches in response to bug-prone code
- Evaluating results using BLEU, ROUGE, and human inspection
The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection.
π Performance Results
Evaluation Metrics
β BLEU Score: 33.87
β ROUGE Scores:
- ROUGE-1: P=0.3775, R=0.7306, F1=0.4355
- ROUGE-2: P=0.2898, R=0.6096, F1=0.3457
- ROUGE-L: P=0.3023, R=0.6333, F1=0.3612
These results demonstrate the model's ability to:
- Generate syntactically correct Git diff patches
- Maintain semantic similarity to reference fixes
- Produce meaningful code changes that address the underlying bugs
π§ Model Configuration
- Base model:
CodeLLaMA-7B-Instruct
- Fine-tuning method: QLoRA with 4-bit quantization
- Training setup:
- LoRA r=64, alpha=16, dropout=0.1
- Batch size: 64, LR: 2e-4, Epochs: 3
- Mixed precision (bfloat16), gradient checkpointing
- Hardware: Optimized for NVIDIA H200 GPUs
π Dataset
Custom dataset extracted from Linux kernel Git history.
Filtering Criteria
Bug-fix commits containing:
fix
, bug
, crash
, memory
, null
, panic
, overflow
, race
, corruption
, etc.
Structure
- Language: C (
.c
,.h
) - Context: 10 lines before/after the change
- Format:
{
"input": {
"original code": "C code snippet with bug",
"instruction": "Commit message or fix description"
},
"output": {
"diff codes": "Git diff showing the fix"
}
}
- File:
training_data_100k.jsonl
(100,000 samples)
π Quick Start
Prerequisites
- Python 3.8+
- CUDA-compatible GPU (recommended)
- 16GB+ RAM
- 50GB+ disk space
Install dependencies
pip install -r requirements.txt
1. Build the Dataset
cd dataset_builder
python extract_linux_bugfixes_parallel.py
python format_for_training.py
2. Fine-tune the Model
cd train
python train_codellama_qlora_linux_bugfix.py
3. Run Evaluation
cd evaluate
python evaluate_linux_bugfix_model.py
4. Use the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# Load the fine-tuned model
model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")
tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
# Generate a bug fix
prompt = """
Given the following original C code:
```c
if (!file->filter)
return;
Instruction: Fix the null pointer dereference
Return the diff that fixes it: """
inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=512, temperature=0.1) fix = tokenizer.decode(outputs[0], skip_special_tokens=True) print(fix)
---
## π Project Structure
CodeLLaMA-Linux-BugFix/ βββ dataset_builder/ β βββ extract_linux_bugfixes_parallel.py # Parallel extraction of bug fixes β βββ format_for_training.py # Format data for training β βββ build_dataset.py # Main dataset builder βββ dataset/ β βββ training_data_100k.jsonl # 100K training samples β βββ training_data_prompt_completion.jsonl # Formatted training data βββ train/ β βββ train_codellama_qlora_linux_bugfix.py # Main training script β βββ train_codellama_qlora_simple.py # Simplified training β βββ download_codellama_model.py # Model download utility β βββ output/ β βββ qlora-codellama-bugfix/ # Trained model checkpoints βββ evaluate/ β βββ evaluate_linux_bugfix_model.py # Evaluation script β βββ test_samples.jsonl # Test dataset β βββ output/ # Evaluation results β βββ eval_results.csv # Detailed results β βββ eval_results.json # JSON format results βββ requirements.txt # Python dependencies βββ README.md # This file βββ PROJECT_STRUCTURE.md # Detailed project overview
---
## π§© Features
* π§ **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings
* π§ **Real-world commits**: From actual Linux kernel development
* π‘ **Context-aware**: Code context extraction around bug lines
* π» **Output-ready**: Generates valid Git-style diffs
* π **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics
* π **Production-ready**: Optimized for real-world deployment
---
## π Evaluation Metrics
* **BLEU**: Translation-style match to reference diffs
* **ROUGE**: Overlap in fix content and semantic similarity
* **Human Evaluation**: Subjective patch quality assessment
### Current Performance
- **BLEU Score**: 33.87 (excellent for code generation tasks)
- **ROUGE-1 F1**: 0.4355 (good semantic overlap)
- **ROUGE-2 F1**: 0.3457 (reasonable bigram matching)
- **ROUGE-L F1**: 0.3612 (good longest common subsequence)
---
## π§ͺ Use Cases
* **Automated kernel bug fixing**: Generate fixes for common kernel bugs
* **Code review assistance**: Help reviewers identify potential issues
* **Teaching/debugging kernel code**: Educational tool for kernel development
* **Research in automated program repair (APR)**: Academic research applications
* **CI/CD integration**: Automated testing and fixing in development pipelines
---
## π¬ Technical Highlights
### Memory & Speed Optimizations
* 4-bit quantization (NF4)
* Gradient checkpointing
* Mixed precision (bfloat16)
* Gradient accumulation
* LoRA parameter efficiency
### Training Efficiency
* **QLoRA**: Reduces memory usage by ~75%
* **4-bit quantization**: Further memory optimization
* **Gradient checkpointing**: Trades compute for memory
* **Mixed precision**: Faster training with maintained accuracy
---
## π οΈ Advanced Usage
### Custom Training
```bash
# Train with custom parameters
python train_codellama_qlora_linux_bugfix.py \
--learning_rate 1e-4 \
--num_epochs 5 \
--batch_size 32 \
--lora_r 32 \
--lora_alpha 16
Evaluation on Custom Data
# Evaluate on your own test set
python evaluate_linux_bugfix_model.py \
--test_file your_test_data.jsonl \
--output_dir custom_eval_results
π€ Contributing
- Fork this repo
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request π
Development Guidelines
- Follow PEP 8 style guidelines
- Add tests for new features
- Update documentation for API changes
- Ensure all tests pass before submitting PR
π License
MIT License β see LICENSE
file for details.
π Acknowledgments
- Meta for CodeLLaMA base model
- Hugging Face for Transformers + PEFT libraries
- The Linux kernel community for open access to commit data
- Microsoft for introducing LoRA technique
- University of Washington for QLoRA research
π References
- CodeLLaMA (Meta, 2023)
- QLoRA (Dettmers et al., 2023)
- LoRA (Hu et al., 2021)
- Automated Program Repair: A Survey
π Support
For questions, issues, or contributions:
- Open an issue on GitHub
- Check the project documentation
- Review the evaluation results in
evaluate/output/
π Version History
- v1.0.0: Initial release with QLoRA training
- v1.1.0: Added parallel dataset extraction
- v1.2.0: Improved evaluation metrics and documentation