Kimi K2.5 LoRA โ Opus Magnum REPL agent (step 90)
LoRA adapter trained with reinforcement learning on the Opus-Magnum-puzzle-solving REPL benchmark. Snapshot at training step 90 of an in-progress run.
Training setup
- Base model:
moonshotai/Kimi-K2.5 - Renderer:
kimi_k25(thinking on, with system-prompt suffix asking the agent to skip<think>blocks and callsafe_verify/submitevery turn) - Adapter: LoRA, rank
32 - RL recipe: GRPO via Thinking Machines' Tinker SDK
- Dataset:
opus-magnum-k1-plus-single-arm-easyโ 336 puzzles, mix of 57 instructions-only k1 + 279 single-arm campaign puzzles (chapter-1 puzzles p007โp013) - Scoring:
partial-creditprofile with shaped sub-signals (delivered, target_atom_types_frac, grabbed, moved) - Hyperparameters:
learning_rate = 5e-5kl_penalty_coef = 0.01max_turns = 8,max_tokens = 3072,max_trajectory_tokens = 29000groups_per_batch = 6,group_size = 4context_overflow_reward = 0.0(so partial credit survives overflow)
Learning signal at this checkpoint
| signal | base (step 0) | step 85 |
|---|---|---|
| empty-step rate (all-think, no code) | 62 % | 8 % |
| eval rollouts with reward > 0 | 3.3 % (4/120) | 51.7 % (62/120) |
best_partial_grabbed (avg) |
~0 % | ~70 % |
best_partial_moved (avg) |
~0 % | ~60 % |
best_partial_delivered (avg) |
~0 % | 1โ4 % |
test/reward/total |
0.0029 | 0.065 (peak) |
| held-out solves | 0 | rare (2 across iters 65 + 75) |
The model learned to (1) suppress <think> blocks and emit Python code
within max_tokens, (2) call safe_verify(...) and submit(...)
each turn, and (3) write arm programs that successfully grab and move
atoms on a majority of held-out puzzles. Full delivery / solve is
still rare on the test split.
Files
adapter_model.safetensorsโ Tinker raw LoRA adapter weightsadapter_config.jsonโ Tinker adapter metadata (rank, alpha, target modules)README.mdโ this file
Converting to PEFT format for vLLM / SGLang
The files above are in Tinker's raw adapter format. To convert to
PEFT format suitable for direct vLLM --lora-modules loading, run on
a machine that can host the base model:
from tinker_cookbook.weights import build_lora_adapter
build_lora_adapter(
base_model="moonshotai/Kimi-K2.5",
adapter_path="./tinker_adapter", # this repo's contents
output_path="./peft_adapter",
)
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for GoodStartLabs/kimi-k2.5-opus-magnum-lora-step90
Base model
moonshotai/Kimi-K2.5