Qwen3-14B-ResumeAgent

A Qwen3-14B model trained with RLVR (Reinforcement Learning from Verifiable Rewards) to revise resumes in Markdown using coding-agent-style tool calls, given a job description.

Blog Post | Code & Data

What This Is

A proof-of-concept demonstrating that RLVR with an LLM judge can train agentic editing beyond fully-verifiable domains. The model operates in a multi-turn tool-use loop: it reads a resume and job description, makes targeted edits via str_replace, checks its work with review, and finalizes with submit.

This is a research artifact, not a production tool for resume editing.

Training

Base model: Qwen/Qwen3-14B, willcb/Qwen3-14B
Algorithm: AIPO via PRIME-RL
Hardware: 4×H200 (1 vLLM inference, 3 FSDP2 training), ~10 hours
Data: 160 synthetic resume-JD pairs with planted weaknesses
Reward: Three-rubric system (tool syntax + layout constraint + LLM judge content quality), range [-7, +8]

Results

Evaluated on 40 held-out examples, 16 rollouts each (640 total), using Gemini 3 Flash as the judge.

Metric	Base	Trained	Change
Mean reward	2.73	5.41	+98%
Submit rate	33%	100%	+67pp
Edit failure rate	37%	4%	−33pp
Hallucination score	0.36	0.87	+139%
Content quality	1.37	1.78	+30%

Known Limitations

Layout compliance regressed — the binary reward and data imbalance failed to teach length-reduction strategies
Quantification collapsed to zero — the hallucination penalty correctly suppressed number fabrication, but also suppressed adding real metrics
Constructive quality improved slowly — defensive metrics (not fabricating, not failing) dominated; constructive metrics (better writing) learned incompletely
Synthetic data only — trained and evaluated on generated resume-JD pairs, not real documents

Citation

If you find this work useful, please cite the blog post:

@misc{che2026rlvr_resume_agent,
  author = {Che, Kaiwen},
  title = {Can Coding Agents Learn Editorial Taste with RLVR?},
  year = {2026},
  url = {https://kaiwenche.github.io/posts/rlvr-resume-agent/}
}

Downloads last month: 2

Safetensors

Model size

15B params

Tensor type

BF16

Video Preview

Reinforcement Learning

Model tree for FlandreS/Qwen3-14B-ResumeAgent

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Finetuned

(260)

this model