Qwen3-14B-ResumeAgent

A Qwen3-14B model trained with RLVR (Reinforcement Learning from Verifiable Rewards) to revise resumes in Markdown using coding-agent-style tool calls, given a job description.

Blog Post | Code & Data

What This Is

A proof-of-concept demonstrating that RLVR with an LLM judge can train agentic editing beyond fully-verifiable domains. The model operates in a multi-turn tool-use loop: it reads a resume and job description, makes targeted edits via str_replace, checks its work with review, and finalizes with submit.

This is a research artifact, not a production tool for resume editing.

Training

  • Base model: Qwen/Qwen3-14B, willcb/Qwen3-14B
  • Algorithm: AIPO via PRIME-RL
  • Hardware: 4ร—H200 (1 vLLM inference, 3 FSDP2 training), ~10 hours
  • Data: 160 synthetic resume-JD pairs with planted weaknesses
  • Reward: Three-rubric system (tool syntax + layout constraint + LLM judge content quality), range [-7, +8]

Results

Evaluated on 40 held-out examples, 16 rollouts each (640 total), using Gemini 3 Flash as the judge.

Metric Base Trained Change
Mean reward 2.73 5.41 +98%
Submit rate 33% 100% +67pp
Edit failure rate 37% 4% โˆ’33pp
Hallucination score 0.36 0.87 +139%
Content quality 1.37 1.78 +30%

Known Limitations

  • Layout compliance regressed โ€” the binary reward and data imbalance failed to teach length-reduction strategies
  • Quantification collapsed to zero โ€” the hallucination penalty correctly suppressed number fabrication, but also suppressed adding real metrics
  • Constructive quality improved slowly โ€” defensive metrics (not fabricating, not failing) dominated; constructive metrics (better writing) learned incompletely
  • Synthetic data only โ€” trained and evaluated on generated resume-JD pairs, not real documents

Citation

If you find this work useful, please cite the blog post:

@misc{che2026rlvr_resume_agent,
  author = {Che, Kaiwen},
  title = {Can Coding Agents Learn Editorial Taste with RLVR?},
  year = {2026},
  url = {https://kaiwenche.github.io/posts/rlvr-resume-agent/}
}
Downloads last month
2
Safetensors
Model size
15B params
Tensor type
BF16
ยท
Video Preview
loading

Model tree for FlandreS/Qwen3-14B-ResumeAgent

Finetuned
Qwen/Qwen3-14B
Finetuned
(260)
this model