Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny
This repository contains checkpoints and resources for the Re:Form project, which explores novel approaches to formal software verification using Large Language Models (LLMs). The work was presented in the paper Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny.
- ๐ Project Page
- ๐พ GitHub Repository
Overview
Existing informal language-based LLMs often face challenges with reliable and scalable verification processes. Re:Form addresses this by grounding LLMs in rigorous formal systems, such as Dafny, to enable automatic and mathematically provable verification of their reasoning. This capability is vital for achieving large-scale, reliable formal software verification while systematically reducing reliance on extensive human priors.
The core of Re:Form's pipeline involves:
- An automatic and scalable data curation process.
- Careful Reinforcement Learning (RL) designs integrated with feedback from a formal language verifier.
Even small models (e.g., 0.5B) fine-tuned with Re:Form can generate syntactically valid and verifiable Dafny code, surpassing proprietary models and demonstrating stronger generalization to out-of-domain tasks, as evidenced by performance on the challenging DafnyComp benchmark.
Usage
You can use the Re:Form models with the Hugging Face transformers
library. Below is an example using the sft_0.5B
checkpoint. Other checkpoints (e.g., sft_1.5B
, sft_3B
, sft_7B
, sft_14B
, 14B-RL-entropy
) are also available from the Veri-Code Hugging Face organization.
First, ensure you have the transformers
library and torch
installed. For optimal performance, especially with larger models, consider installing FlashAttention
as detailed in the GitHub repository's installation instructions.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "Veri-Code/sft_0.5B" # Example checkpoint, replace with other available models if needed
# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16, # Use torch.float16 or torch.float32 depending on your hardware
device_map="auto",
trust_remote_code=True
)
# Example for generating Dafny code
prompt = """
method print_sum(a: int, b: int) returns (sum: int)
ensures sum == a + b
{
return a + b;
}
method Max(a: int, b: int) returns (result: int)
ensures result == a || result == b;
ensures result >= a && result >= b;
{
"""
inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
# Generate text (Dafny code)
output_ids = model.generate(
inputs,
max_new_tokens=2048, # Max new tokens as specified in generation_config.json
do_sample=True,
temperature=0.6,
top_p=0.9,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id, # Usually set pad_token_id to eos_token_id for causal LMs
)
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)
For more details on training, evaluation, and other advanced usage, please refer to the official GitHub repository.
Citation
If you use this work in your research, please cite the paper:
@misc{yan2025reformreducinghuman,
title={Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny},
author={Chuanhao Yan and Fengdi Che and Xuhan Huang and Xu Xu and Xin Li and Yizhi Li and Xingwei Qu and Jingzhe Shi and Zhuangzhuang He and Chenghua Lin and Yaodong Yang and Binhang Yuan and Hang Zhao and Yu Qiao and Bowen Zhou and Jie Fu},
year={2025},
eprint={2507.16331},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.16331},
}
- Downloads last month
- 8