StringChaos commited on
Commit
a635839
·
verified ·
1 Parent(s): 5c911ee

Upload R2E-TestgenAgent - Testing Agent for R2E-Gym

Browse files
README.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # R2E-TestgenAgent
2
+
3
+ ## Overview
4
+
5
+ R2E-TestgenAgent is a specialized execution-based testing agent designed for generating targeted unit tests for software engineering tasks. This agent is part of the R2E-Gym framework, which provides a comprehensive environment for training and evaluating software engineering agents.
6
+
7
+ ## Model Description
8
+
9
+ The R2E-TestgenAgent is an execution-based testing agent that specializes in:
10
+ - **Targeted Unit Test Generation**: Creates specific unit tests to validate code patches and implementations
11
+ - **Execution-Based Verification**: Generates tests that can be executed to verify the correctness of code changes
12
+ - **Corner Case Detection**: Identifies and tests potential edge cases and corner scenarios
13
+ - **Patch Disambiguation**: Creates tests that can differentiate between correct and incorrect patches
14
+
15
+ ## Architecture
16
+
17
+ The agent is built on top of the Qwen2.5-Coder-32B-Instruct model and fine-tuned using R2E-Gym's SFT (Supervised Fine-Tuning) trajectories specifically designed for testing tasks.
18
+
19
+ ## Training Data
20
+
21
+ The model was trained on the `R2E-Gym/R2EGym-TestingAgent-SFT-Trajectories` dataset, which contains:
22
+ - High-quality testing trajectories collected from Claude-3.5-Sonnet
23
+ - Execution-based testing scenarios
24
+ - Diverse software engineering problems across 13 repositories
25
+ - Real-world testing patterns and methodologies
26
+
27
+ ## Usage
28
+
29
+ ### Basic Usage
30
+
31
+ ```python
32
+ from r2egym.agenthub.environment.env import EnvArgs, RepoEnv
33
+ from r2egym.agenthub.agent.agent import AgentArgs, Agent
34
+ from pathlib import Path
35
+ from datasets import load_dataset
36
+
37
+ # Load dataset
38
+ ds = load_dataset("R2E-Gym/R2E-Gym-Lite")
39
+ env_args = EnvArgs(ds=ds['train'][0])
40
+ env = RepoEnv(env_args)
41
+
42
+ # Load testing agent configuration
43
+ agent_args = AgentArgs.from_yaml(Path('./config/testing_agent.yaml'))
44
+ agent_args.llm_name = 'r2e-gym/R2E-TestgenAgent'
45
+ agent = Agent(name="TestingAgent", args=agent_args)
46
+
47
+ # Run the testing agent
48
+ output = agent.run(env, max_steps=30, use_fn_calling=True)
49
+ ```
50
+
51
+ ### Configuration
52
+
53
+ The agent uses specific prompts and configurations optimized for test generation:
54
+
55
+ ```yaml
56
+ system_prompt: |
57
+ You are a specialized testing agent designed to generate targeted unit tests
58
+ for software engineering tasks. Your goal is to create comprehensive tests
59
+ that can validate code patches and identify potential issues.
60
+
61
+ instance_prompt: |
62
+ Given the following problem and potential patches, create targeted unit tests
63
+ that can effectively validate the correctness of the implementation.
64
+ ```
65
+
66
+ ## Training Configuration
67
+
68
+ The model was trained using the following configuration:
69
+
70
+ - **Base Model**: Qwen/Qwen2.5-Coder-32B-instruct
71
+ - **Training Method**: Full fine-tuning with DeepSpeed ZeRO-3
72
+ - **Learning Rate**: 1.0e-5
73
+ - **Epochs**: 2.0
74
+ - **Batch Size**: 1 (per device)
75
+ - **Context Length**: 20,480 tokens
76
+ - **Optimizer**: AdamW with cosine learning rate scheduling
77
+
78
+ ## Performance
79
+
80
+ The R2E-TestgenAgent is designed to work in conjunction with other R2E-Gym agents:
81
+ - **Code Editing Agent**: For generating and fixing code
82
+ - **Execution-free Verifier**: For reranking patches
83
+ - **Hybrid Test-time Scaling**: Combines execution-based and execution-free verification
84
+
85
+ ## Integration with R2E-Gym
86
+
87
+ This agent is part of the larger R2E-Gym ecosystem:
88
+
89
+ 1. **Environment**: Works with R2E-Gym's 8.1K+ procedurally curated environments
90
+ 2. **Evaluation**: Can be evaluated on SWE-Bench Verified and other benchmarks
91
+ 3. **Training**: Supports continued training on additional trajectories
92
+
93
+ ## Citation
94
+
95
+ If you use R2E-TestgenAgent in your research, please cite:
96
+
97
+ ```bibtex
98
+ @article{jain2025r2e,
99
+ title={R2e-gym: Procedural environments and hybrid verifiers for scaling open-weights swe agents},
100
+ author={Jain, Naman and Singh, Jaskirat and Shetty, Manish and Zheng, Liang and Sen, Koushik and Stoica, Ion},
101
+ journal={arXiv preprint arXiv:2504.07164},
102
+ year={2025}
103
+ }
104
+ ```
105
+
106
+ ## License
107
+
108
+ This model is released under the same license as the base Qwen2.5-Coder model.
109
+
110
+ ## Links
111
+
112
+ - **Paper**: [R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents](https://arxiv.org/abs/2504.07164)
113
+ - **GitHub**: [R2E-Gym](https://github.com/R2E-Gym/R2E-Gym)
114
+ - **Dataset**: [R2EGym-TestingAgent-SFT-Trajectories](https://huggingface.co/datasets/R2E-Gym/R2EGym-TestingAgent-SFT-Trajectories)
115
+ - **Related Models**:
116
+ - [R2EGym-32B-Agent](https://huggingface.co/R2E-Gym/R2EGym-32B-Agent)
117
+ - [R2EGym-Verifier](https://huggingface.co/R2E-Gym/R2EGym-Verifier)
model_card.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ language:
5
+ - en
6
+ tags:
7
+ - code
8
+ - software-engineering
9
+ - testing
10
+ - unit-tests
11
+ - r2e-gym
12
+ - swe-bench
13
+ base_model: Qwen/Qwen2.5-Coder-32B-Instruct
14
+ datasets:
15
+ - R2E-Gym/R2EGym-TestingAgent-SFT-Trajectories
16
+ model_type: qwen2
17
+ ---
18
+
19
+ # R2E-TestgenAgent
20
+
21
+ A specialized execution-based testing agent for generating targeted unit tests in software engineering tasks.
22
+
23
+ ## Model Details
24
+
25
+ - **Model Type**: Qwen2.5-Coder-32B fine-tuned for test generation
26
+ - **Training Data**: R2E-Gym SFT trajectories for testing tasks
27
+ - **Use Case**: Automated unit test generation for software engineering
28
+ - **Framework**: R2E-Gym ecosystem
29
+
30
+ ## Usage
31
+
32
+ ```python
33
+ from transformers import AutoTokenizer, AutoModelForCausalLM
34
+
35
+ model_name = "r2e-gym/R2E-TestgenAgent"
36
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
37
+ model = AutoModelForCausalLM.from_pretrained(model_name)
38
+
39
+ # Use with R2E-Gym framework for best results
40
+ from r2egym.agenthub.agent.agent import Agent, AgentArgs
41
+ agent_args = AgentArgs.from_yaml("testing_agent_config.yaml")
42
+ agent = Agent(name="TestingAgent", args=agent_args)
43
+ ```
44
+
45
+ ## Training
46
+
47
+ - **Base Model**: Qwen/Qwen2.5-Coder-32B-Instruct
48
+ - **Training Method**: Full fine-tuning with DeepSpeed
49
+ - **Learning Rate**: 1e-5
50
+ - **Epochs**: 2
51
+ - **Context Length**: 20,480 tokens
52
+
53
+ ## Citation
54
+
55
+ ```bibtex
56
+ @article{jain2025r2e,
57
+ title={R2e-gym: Procedural environments and hybrid verifiers for scaling open-weights swe agents},
58
+ author={Jain, Naman and Singh, Jaskirat and Shetty, Manish and Zheng, Liang and Sen, Koushik and Stoica, Ion},
59
+ journal={arXiv preprint arXiv:2504.07164},
60
+ year={2025}
61
+ }
62
+ ```
push_to_hf.py ADDED
@@ -0,0 +1 @@
 
 
1
+
testing_agent_config.yaml ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ system_prompt: |-
2
+ You are a specialized testing agent designed to generate targeted unit tests for software engineering tasks. Your goal is to create comprehensive tests that can validate code patches and identify potential issues.
3
+
4
+ You have access to a repository environment where you can explore files, understand the codebase structure, and generate appropriate unit tests. Your tests should be:
5
+ 1. Comprehensive - covering both normal cases and edge cases
6
+ 2. Targeted - specifically designed to validate the correctness of code changes
7
+ 3. Executable - able to run and provide clear pass/fail results
8
+ 4. Discriminative - able to differentiate between correct and incorrect implementations
9
+
10
+ Use the available tools to explore the repository, understand the problem context, and create effective test cases.
11
+
12
+ instance_prompt: |-
13
+ Consider the following software engineering problem:
14
+ <problem>
15
+ {problem_statement}
16
+ </problem>
17
+
18
+ Your task is to generate targeted unit tests that can effectively validate any proposed solution to this problem. The tests should:
19
+
20
+ 1. **Explore the repository structure** to understand the codebase and existing test patterns
21
+ 2. **Analyze the problem** to identify key requirements and edge cases
22
+ 3. **Create comprehensive tests** that cover:
23
+ - Normal functionality
24
+ - Edge cases and boundary conditions
25
+ - Error handling scenarios
26
+ - Integration with existing code
27
+ 4. **Ensure tests are executable** and provide clear validation of correctness
28
+
29
+ Focus on creating tests that can distinguish between correct and incorrect implementations. Your tests should be thorough enough to catch potential bugs while being specific enough to validate the correct behavior.
30
+
31
+ IMPORTANT:
32
+ - Create tests in a file named `test_issue.py`
33
+ - Use standard Python testing frameworks (unittest, pytest, etc.)
34
+ - Each test should have a clear, descriptive name
35
+ - Include proper assertions and error handling
36
+ - Add comments explaining what each test validates
37
+
38
+ Generate tests that would help validate the correctness of any proposed solution to the given problem.
39
+
40
+ command_files:
41
+ - src/r2egym/agenthub/runtime/bash_cmds.yml
42
+
43
+ llm_name: "r2e-gym/R2E-TestgenAgent"
44
+ use_demo: false
45
+ other_args:
46
+ max_retries: 3
47
+ timeout: 300
train_r2egym_32B_testing_agent.yaml ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### model
2
+ model_name_or_path: Qwen/Qwen2.5-Coder-32B-instruct
3
+ trust_remote_code: true
4
+
5
+ ### method
6
+ stage: sft
7
+ do_train: true
8
+ finetuning_type: full
9
+ deepspeed: examples/deepspeed/ds_z3_offload_config.json
10
+ # deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
11
+
12
+ ### dataset
13
+ dataset: R2E-Gym/R2EGym-TestingAgent-SFT-Trajectories
14
+ template: qwen
15
+ cutoff_len: 20480
16
+ max_samples: 100000
17
+ overwrite_cache: true
18
+ preprocessing_num_workers: 16
19
+
20
+ ### output
21
+ output_dir: saves/R2EGym-32B-TestingAgent
22
+ logging_steps: 10
23
+ save_steps: 10000
24
+ plot_loss: true
25
+ overwrite_output_dir: true
26
+
27
+ ### train
28
+ flash_attn: fa2
29
+ enable_liger_kernel: true
30
+ use_unsloth_gc: true
31
+ per_device_train_batch_size: 1
32
+ gradient_accumulation_steps: 1
33
+ learning_rate: 1.0e-5
34
+ num_train_epochs: 2.0
35
+ lr_scheduler_type: cosine
36
+ warmup_ratio: 0.05
37
+ bf16: true
38
+ ddp_timeout: 180000000
39
+
40
+ ### wandb
41
+ report_to: wandb
42
+ run_name: R2EGym-32B-TestingAgent