Instructions to use FlandreS/Qwen3-14B-ResumeAgent with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FlandreS/Qwen3-14B-ResumeAgent with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("FlandreS/Qwen3-14B-ResumeAgent") model = AutoModelForCausalLM.from_pretrained("FlandreS/Qwen3-14B-ResumeAgent") - Notebooks
- Google Colab
- Kaggle
Qwen3-14B-ResumeAgent
A Qwen3-14B model trained with RLVR (Reinforcement Learning from Verifiable Rewards) to revise resumes in Markdown using coding-agent-style tool calls, given a job description.
What This Is
A proof-of-concept demonstrating that RLVR with an LLM judge can train agentic editing beyond fully-verifiable domains. The model operates in a multi-turn tool-use loop: it reads a resume and job description, makes targeted edits via str_replace, checks its work with review, and finalizes with submit.
This is a research artifact, not a production tool for resume editing.
Training
- Base model: Qwen/Qwen3-14B, willcb/Qwen3-14B
- Algorithm: AIPO via PRIME-RL
- Hardware: 4รH200 (1 vLLM inference, 3 FSDP2 training), ~10 hours
- Data: 160 synthetic resume-JD pairs with planted weaknesses
- Reward: Three-rubric system (tool syntax + layout constraint + LLM judge content quality), range [-7, +8]
Results
Evaluated on 40 held-out examples, 16 rollouts each (640 total), using Gemini 3 Flash as the judge.
| Metric | Base | Trained | Change |
|---|---|---|---|
| Mean reward | 2.73 | 5.41 | +98% |
| Submit rate | 33% | 100% | +67pp |
| Edit failure rate | 37% | 4% | โ33pp |
| Hallucination score | 0.36 | 0.87 | +139% |
| Content quality | 1.37 | 1.78 | +30% |
Known Limitations
- Layout compliance regressed โ the binary reward and data imbalance failed to teach length-reduction strategies
- Quantification collapsed to zero โ the hallucination penalty correctly suppressed number fabrication, but also suppressed adding real metrics
- Constructive quality improved slowly โ defensive metrics (not fabricating, not failing) dominated; constructive metrics (better writing) learned incompletely
- Synthetic data only โ trained and evaluated on generated resume-JD pairs, not real documents
Citation
If you find this work useful, please cite the blog post:
@misc{che2026rlvr_resume_agent,
author = {Che, Kaiwen},
title = {Can Coding Agents Learn Editorial Taste with RLVR?},
year = {2026},
url = {https://kaiwenche.github.io/posts/rlvr-resume-agent/}
}
- Downloads last month
- 2