Upload R2E-TestgenAgent - Testing Agent for R2E-Gym
Browse files- README.md +117 -0
- model_card.md +62 -0
- push_to_hf.py +1 -0
- testing_agent_config.yaml +47 -0
- train_r2egym_32B_testing_agent.yaml +42 -0
README.md
ADDED
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# R2E-TestgenAgent
|
2 |
+
|
3 |
+
## Overview
|
4 |
+
|
5 |
+
R2E-TestgenAgent is a specialized execution-based testing agent designed for generating targeted unit tests for software engineering tasks. This agent is part of the R2E-Gym framework, which provides a comprehensive environment for training and evaluating software engineering agents.
|
6 |
+
|
7 |
+
## Model Description
|
8 |
+
|
9 |
+
The R2E-TestgenAgent is an execution-based testing agent that specializes in:
|
10 |
+
- **Targeted Unit Test Generation**: Creates specific unit tests to validate code patches and implementations
|
11 |
+
- **Execution-Based Verification**: Generates tests that can be executed to verify the correctness of code changes
|
12 |
+
- **Corner Case Detection**: Identifies and tests potential edge cases and corner scenarios
|
13 |
+
- **Patch Disambiguation**: Creates tests that can differentiate between correct and incorrect patches
|
14 |
+
|
15 |
+
## Architecture
|
16 |
+
|
17 |
+
The agent is built on top of the Qwen2.5-Coder-32B-Instruct model and fine-tuned using R2E-Gym's SFT (Supervised Fine-Tuning) trajectories specifically designed for testing tasks.
|
18 |
+
|
19 |
+
## Training Data
|
20 |
+
|
21 |
+
The model was trained on the `R2E-Gym/R2EGym-TestingAgent-SFT-Trajectories` dataset, which contains:
|
22 |
+
- High-quality testing trajectories collected from Claude-3.5-Sonnet
|
23 |
+
- Execution-based testing scenarios
|
24 |
+
- Diverse software engineering problems across 13 repositories
|
25 |
+
- Real-world testing patterns and methodologies
|
26 |
+
|
27 |
+
## Usage
|
28 |
+
|
29 |
+
### Basic Usage
|
30 |
+
|
31 |
+
```python
|
32 |
+
from r2egym.agenthub.environment.env import EnvArgs, RepoEnv
|
33 |
+
from r2egym.agenthub.agent.agent import AgentArgs, Agent
|
34 |
+
from pathlib import Path
|
35 |
+
from datasets import load_dataset
|
36 |
+
|
37 |
+
# Load dataset
|
38 |
+
ds = load_dataset("R2E-Gym/R2E-Gym-Lite")
|
39 |
+
env_args = EnvArgs(ds=ds['train'][0])
|
40 |
+
env = RepoEnv(env_args)
|
41 |
+
|
42 |
+
# Load testing agent configuration
|
43 |
+
agent_args = AgentArgs.from_yaml(Path('./config/testing_agent.yaml'))
|
44 |
+
agent_args.llm_name = 'r2e-gym/R2E-TestgenAgent'
|
45 |
+
agent = Agent(name="TestingAgent", args=agent_args)
|
46 |
+
|
47 |
+
# Run the testing agent
|
48 |
+
output = agent.run(env, max_steps=30, use_fn_calling=True)
|
49 |
+
```
|
50 |
+
|
51 |
+
### Configuration
|
52 |
+
|
53 |
+
The agent uses specific prompts and configurations optimized for test generation:
|
54 |
+
|
55 |
+
```yaml
|
56 |
+
system_prompt: |
|
57 |
+
You are a specialized testing agent designed to generate targeted unit tests
|
58 |
+
for software engineering tasks. Your goal is to create comprehensive tests
|
59 |
+
that can validate code patches and identify potential issues.
|
60 |
+
|
61 |
+
instance_prompt: |
|
62 |
+
Given the following problem and potential patches, create targeted unit tests
|
63 |
+
that can effectively validate the correctness of the implementation.
|
64 |
+
```
|
65 |
+
|
66 |
+
## Training Configuration
|
67 |
+
|
68 |
+
The model was trained using the following configuration:
|
69 |
+
|
70 |
+
- **Base Model**: Qwen/Qwen2.5-Coder-32B-instruct
|
71 |
+
- **Training Method**: Full fine-tuning with DeepSpeed ZeRO-3
|
72 |
+
- **Learning Rate**: 1.0e-5
|
73 |
+
- **Epochs**: 2.0
|
74 |
+
- **Batch Size**: 1 (per device)
|
75 |
+
- **Context Length**: 20,480 tokens
|
76 |
+
- **Optimizer**: AdamW with cosine learning rate scheduling
|
77 |
+
|
78 |
+
## Performance
|
79 |
+
|
80 |
+
The R2E-TestgenAgent is designed to work in conjunction with other R2E-Gym agents:
|
81 |
+
- **Code Editing Agent**: For generating and fixing code
|
82 |
+
- **Execution-free Verifier**: For reranking patches
|
83 |
+
- **Hybrid Test-time Scaling**: Combines execution-based and execution-free verification
|
84 |
+
|
85 |
+
## Integration with R2E-Gym
|
86 |
+
|
87 |
+
This agent is part of the larger R2E-Gym ecosystem:
|
88 |
+
|
89 |
+
1. **Environment**: Works with R2E-Gym's 8.1K+ procedurally curated environments
|
90 |
+
2. **Evaluation**: Can be evaluated on SWE-Bench Verified and other benchmarks
|
91 |
+
3. **Training**: Supports continued training on additional trajectories
|
92 |
+
|
93 |
+
## Citation
|
94 |
+
|
95 |
+
If you use R2E-TestgenAgent in your research, please cite:
|
96 |
+
|
97 |
+
```bibtex
|
98 |
+
@article{jain2025r2e,
|
99 |
+
title={R2e-gym: Procedural environments and hybrid verifiers for scaling open-weights swe agents},
|
100 |
+
author={Jain, Naman and Singh, Jaskirat and Shetty, Manish and Zheng, Liang and Sen, Koushik and Stoica, Ion},
|
101 |
+
journal={arXiv preprint arXiv:2504.07164},
|
102 |
+
year={2025}
|
103 |
+
}
|
104 |
+
```
|
105 |
+
|
106 |
+
## License
|
107 |
+
|
108 |
+
This model is released under the same license as the base Qwen2.5-Coder model.
|
109 |
+
|
110 |
+
## Links
|
111 |
+
|
112 |
+
- **Paper**: [R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents](https://arxiv.org/abs/2504.07164)
|
113 |
+
- **GitHub**: [R2E-Gym](https://github.com/R2E-Gym/R2E-Gym)
|
114 |
+
- **Dataset**: [R2EGym-TestingAgent-SFT-Trajectories](https://huggingface.co/datasets/R2E-Gym/R2EGym-TestingAgent-SFT-Trajectories)
|
115 |
+
- **Related Models**:
|
116 |
+
- [R2EGym-32B-Agent](https://huggingface.co/R2E-Gym/R2EGym-32B-Agent)
|
117 |
+
- [R2EGym-Verifier](https://huggingface.co/R2E-Gym/R2EGym-Verifier)
|
model_card.md
ADDED
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
library_name: transformers
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
tags:
|
7 |
+
- code
|
8 |
+
- software-engineering
|
9 |
+
- testing
|
10 |
+
- unit-tests
|
11 |
+
- r2e-gym
|
12 |
+
- swe-bench
|
13 |
+
base_model: Qwen/Qwen2.5-Coder-32B-Instruct
|
14 |
+
datasets:
|
15 |
+
- R2E-Gym/R2EGym-TestingAgent-SFT-Trajectories
|
16 |
+
model_type: qwen2
|
17 |
+
---
|
18 |
+
|
19 |
+
# R2E-TestgenAgent
|
20 |
+
|
21 |
+
A specialized execution-based testing agent for generating targeted unit tests in software engineering tasks.
|
22 |
+
|
23 |
+
## Model Details
|
24 |
+
|
25 |
+
- **Model Type**: Qwen2.5-Coder-32B fine-tuned for test generation
|
26 |
+
- **Training Data**: R2E-Gym SFT trajectories for testing tasks
|
27 |
+
- **Use Case**: Automated unit test generation for software engineering
|
28 |
+
- **Framework**: R2E-Gym ecosystem
|
29 |
+
|
30 |
+
## Usage
|
31 |
+
|
32 |
+
```python
|
33 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
34 |
+
|
35 |
+
model_name = "r2e-gym/R2E-TestgenAgent"
|
36 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
37 |
+
model = AutoModelForCausalLM.from_pretrained(model_name)
|
38 |
+
|
39 |
+
# Use with R2E-Gym framework for best results
|
40 |
+
from r2egym.agenthub.agent.agent import Agent, AgentArgs
|
41 |
+
agent_args = AgentArgs.from_yaml("testing_agent_config.yaml")
|
42 |
+
agent = Agent(name="TestingAgent", args=agent_args)
|
43 |
+
```
|
44 |
+
|
45 |
+
## Training
|
46 |
+
|
47 |
+
- **Base Model**: Qwen/Qwen2.5-Coder-32B-Instruct
|
48 |
+
- **Training Method**: Full fine-tuning with DeepSpeed
|
49 |
+
- **Learning Rate**: 1e-5
|
50 |
+
- **Epochs**: 2
|
51 |
+
- **Context Length**: 20,480 tokens
|
52 |
+
|
53 |
+
## Citation
|
54 |
+
|
55 |
+
```bibtex
|
56 |
+
@article{jain2025r2e,
|
57 |
+
title={R2e-gym: Procedural environments and hybrid verifiers for scaling open-weights swe agents},
|
58 |
+
author={Jain, Naman and Singh, Jaskirat and Shetty, Manish and Zheng, Liang and Sen, Koushik and Stoica, Ion},
|
59 |
+
journal={arXiv preprint arXiv:2504.07164},
|
60 |
+
year={2025}
|
61 |
+
}
|
62 |
+
```
|
push_to_hf.py
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
|
testing_agent_config.yaml
ADDED
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
system_prompt: |-
|
2 |
+
You are a specialized testing agent designed to generate targeted unit tests for software engineering tasks. Your goal is to create comprehensive tests that can validate code patches and identify potential issues.
|
3 |
+
|
4 |
+
You have access to a repository environment where you can explore files, understand the codebase structure, and generate appropriate unit tests. Your tests should be:
|
5 |
+
1. Comprehensive - covering both normal cases and edge cases
|
6 |
+
2. Targeted - specifically designed to validate the correctness of code changes
|
7 |
+
3. Executable - able to run and provide clear pass/fail results
|
8 |
+
4. Discriminative - able to differentiate between correct and incorrect implementations
|
9 |
+
|
10 |
+
Use the available tools to explore the repository, understand the problem context, and create effective test cases.
|
11 |
+
|
12 |
+
instance_prompt: |-
|
13 |
+
Consider the following software engineering problem:
|
14 |
+
<problem>
|
15 |
+
{problem_statement}
|
16 |
+
</problem>
|
17 |
+
|
18 |
+
Your task is to generate targeted unit tests that can effectively validate any proposed solution to this problem. The tests should:
|
19 |
+
|
20 |
+
1. **Explore the repository structure** to understand the codebase and existing test patterns
|
21 |
+
2. **Analyze the problem** to identify key requirements and edge cases
|
22 |
+
3. **Create comprehensive tests** that cover:
|
23 |
+
- Normal functionality
|
24 |
+
- Edge cases and boundary conditions
|
25 |
+
- Error handling scenarios
|
26 |
+
- Integration with existing code
|
27 |
+
4. **Ensure tests are executable** and provide clear validation of correctness
|
28 |
+
|
29 |
+
Focus on creating tests that can distinguish between correct and incorrect implementations. Your tests should be thorough enough to catch potential bugs while being specific enough to validate the correct behavior.
|
30 |
+
|
31 |
+
IMPORTANT:
|
32 |
+
- Create tests in a file named `test_issue.py`
|
33 |
+
- Use standard Python testing frameworks (unittest, pytest, etc.)
|
34 |
+
- Each test should have a clear, descriptive name
|
35 |
+
- Include proper assertions and error handling
|
36 |
+
- Add comments explaining what each test validates
|
37 |
+
|
38 |
+
Generate tests that would help validate the correctness of any proposed solution to the given problem.
|
39 |
+
|
40 |
+
command_files:
|
41 |
+
- src/r2egym/agenthub/runtime/bash_cmds.yml
|
42 |
+
|
43 |
+
llm_name: "r2e-gym/R2E-TestgenAgent"
|
44 |
+
use_demo: false
|
45 |
+
other_args:
|
46 |
+
max_retries: 3
|
47 |
+
timeout: 300
|
train_r2egym_32B_testing_agent.yaml
ADDED
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
### model
|
2 |
+
model_name_or_path: Qwen/Qwen2.5-Coder-32B-instruct
|
3 |
+
trust_remote_code: true
|
4 |
+
|
5 |
+
### method
|
6 |
+
stage: sft
|
7 |
+
do_train: true
|
8 |
+
finetuning_type: full
|
9 |
+
deepspeed: examples/deepspeed/ds_z3_offload_config.json
|
10 |
+
# deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
|
11 |
+
|
12 |
+
### dataset
|
13 |
+
dataset: R2E-Gym/R2EGym-TestingAgent-SFT-Trajectories
|
14 |
+
template: qwen
|
15 |
+
cutoff_len: 20480
|
16 |
+
max_samples: 100000
|
17 |
+
overwrite_cache: true
|
18 |
+
preprocessing_num_workers: 16
|
19 |
+
|
20 |
+
### output
|
21 |
+
output_dir: saves/R2EGym-32B-TestingAgent
|
22 |
+
logging_steps: 10
|
23 |
+
save_steps: 10000
|
24 |
+
plot_loss: true
|
25 |
+
overwrite_output_dir: true
|
26 |
+
|
27 |
+
### train
|
28 |
+
flash_attn: fa2
|
29 |
+
enable_liger_kernel: true
|
30 |
+
use_unsloth_gc: true
|
31 |
+
per_device_train_batch_size: 1
|
32 |
+
gradient_accumulation_steps: 1
|
33 |
+
learning_rate: 1.0e-5
|
34 |
+
num_train_epochs: 2.0
|
35 |
+
lr_scheduler_type: cosine
|
36 |
+
warmup_ratio: 0.05
|
37 |
+
bf16: true
|
38 |
+
ddp_timeout: 180000000
|
39 |
+
|
40 |
+
### wandb
|
41 |
+
report_to: wandb
|
42 |
+
run_name: R2EGym-32B-TestingAgent
|