kimi-k2.5-opus-magnum-lora-step90

Kimi K2.5 LoRA — Opus Magnum REPL agent (step 90)

LoRA adapter trained with reinforcement learning on the Opus-Magnum-puzzle-solving REPL benchmark. Snapshot at training step 90 of an in-progress run.

Training setup

Base model: moonshotai/Kimi-K2.5
Renderer: kimi_k25 (thinking on, with system-prompt suffix asking the agent to skip <think> blocks and call safe_verify / submit every turn)
Adapter: LoRA, rank 32
RL recipe: GRPO via Thinking Machines' Tinker SDK
Dataset: opus-magnum-k1-plus-single-arm-easy — 336 puzzles, mix of 57 instructions-only k1 + 279 single-arm campaign puzzles (chapter-1 puzzles p007–p013)
Scoring: partial-credit profile with shaped sub-signals (delivered, target_atom_types_frac, grabbed, moved)
Hyperparameters:
- learning_rate = 5e-5
- kl_penalty_coef = 0.01
- max_turns = 8, max_tokens = 3072, max_trajectory_tokens = 29000
- groups_per_batch = 6, group_size = 4
- context_overflow_reward = 0.0 (so partial credit survives overflow)

Learning signal at this checkpoint

signal	base (step 0)	step 85
empty-step rate (all-think, no code)	62 %	8 %
eval rollouts with reward > 0	3.3 % (4/120)	51.7 % (62/120)
`best_partial_grabbed` (avg)	~0 %	~70 %
`best_partial_moved` (avg)	~0 %	~60 %
`best_partial_delivered` (avg)	~0 %	1–4 %
`test/reward/total`	0.0029	0.065 (peak)
held-out solves	0	rare (2 across iters 65 + 75)

The model learned to (1) suppress <think> blocks and emit Python code within max_tokens, (2) call safe_verify(...) and submit(...) each turn, and (3) write arm programs that successfully grab and move atoms on a majority of held-out puzzles. Full delivery / solve is still rare on the test split.

Files

adapter_model.safetensors — Tinker raw LoRA adapter weights
adapter_config.json — Tinker adapter metadata (rank, alpha, target modules)
README.md — this file

Converting to PEFT format for vLLM / SGLang

The files above are in Tinker's raw adapter format. To convert to PEFT format suitable for direct vLLM --lora-modules loading, run on a machine that can host the base model:

from tinker_cookbook.weights import build_lora_adapter

build_lora_adapter(
    base_model="moonshotai/Kimi-K2.5",
    adapter_path="./tinker_adapter",   # this repo's contents
    output_path="./peft_adapter",
)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GoodStartLabs/kimi-k2.5-opus-magnum-lora-step90

Base model

moonshotai/Kimi-K2.5

Adapter

(22)

this model