Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny

This repository contains checkpoints and resources for the Re:Form project, which explores novel approaches to formal software verification using Large Language Models (LLMs). The work was presented in the paper Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny.

🌐 Project Page
💾 GitHub Repository

Overview

Existing informal language-based LLMs often face challenges with reliable and scalable verification processes. Re:Form addresses this by grounding LLMs in rigorous formal systems, such as Dafny, to enable automatic and mathematically provable verification of their reasoning. This capability is vital for achieving large-scale, reliable formal software verification while systematically reducing reliance on extensive human priors.

The core of Re:Form's pipeline involves:

An automatic and scalable data curation process.
Careful Reinforcement Learning (RL) designs integrated with feedback from a formal language verifier.

Even small models (e.g., 0.5B) fine-tuned with Re:Form can generate syntactically valid and verifiable Dafny code, surpassing proprietary models and demonstrating stronger generalization to out-of-domain tasks, as evidenced by performance on the challenging DafnyComp benchmark.

Usage

You can use the Re:Form models with the Hugging Face transformers library. Below is an example using the sft_0.5B checkpoint. Other checkpoints (e.g., sft_1.5B, sft_3B, sft_7B, sft_14B, 14B-RL-entropy) are also available from the Veri-Code Hugging Face organization.

First, ensure you have the transformers library and torch installed. For optimal performance, especially with larger models, consider installing FlashAttention as detailed in the GitHub repository's installation instructions.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Veri-Code/sft_0.5B" # Example checkpoint, replace with other available models if needed

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16, # Use torch.float16 or torch.float32 depending on your hardware
    device_map="auto",
    trust_remote_code=True
)

# Example for generating Dafny code
prompt = """
method print_sum(a: int, b: int) returns (sum: int)
ensures sum == a + b
{
  return a + b;
}

method Max(a: int, b: int) returns (result: int)
ensures result == a || result == b;
ensures result >= a && result >= b;
{
"""

inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)

# Generate text (Dafny code)
output_ids = model.generate(
    inputs,
    max_new_tokens=2048, # Max new tokens as specified in generation_config.json
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id, # Usually set pad_token_id to eos_token_id for causal LMs
)

generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)

For more details on training, evaluation, and other advanced usage, please refer to the official GitHub repository.

Citation

If you use this work in your research, please cite the paper:

@misc{yan2025reformreducinghuman,
      title={Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny}, 
      author={Chuanhao Yan and Fengdi Che and Xuhan Huang and Xu Xu and Xin Li and Yizhi Li and Xingwei Qu and Jingzhe Shi and Zhuangzhuang He and Chenghua Lin and Yaodong Yang and Binhang Yuan and Hang Zhao and Yu Qiao and Bowen Zhou and Jie Fu},
      year={2025},
      eprint={2507.16331},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.16331}, 
}

Downloads last month: 1

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for Veri-Code/ReForm-SFT-3B

Quantizations

3 models

Collection including Veri-Code/ReForm-SFT-3B

Re:Form

Collection

The resources of Re:Form - Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs. • 9 items • Updated Jul 26 • 2