Kimi K2.5 LoRA โ€” Opus Magnum REPL agent (step 90)

LoRA adapter trained with reinforcement learning on the Opus-Magnum-puzzle-solving REPL benchmark. Snapshot at training step 90 of an in-progress run.

Training setup

  • Base model: moonshotai/Kimi-K2.5
  • Renderer: kimi_k25 (thinking on, with system-prompt suffix asking the agent to skip <think> blocks and call safe_verify / submit every turn)
  • Adapter: LoRA, rank 32
  • RL recipe: GRPO via Thinking Machines' Tinker SDK
  • Dataset: opus-magnum-k1-plus-single-arm-easy โ€” 336 puzzles, mix of 57 instructions-only k1 + 279 single-arm campaign puzzles (chapter-1 puzzles p007โ€“p013)
  • Scoring: partial-credit profile with shaped sub-signals (delivered, target_atom_types_frac, grabbed, moved)
  • Hyperparameters:
    • learning_rate = 5e-5
    • kl_penalty_coef = 0.01
    • max_turns = 8, max_tokens = 3072, max_trajectory_tokens = 29000
    • groups_per_batch = 6, group_size = 4
    • context_overflow_reward = 0.0 (so partial credit survives overflow)

Learning signal at this checkpoint

signal base (step 0) step 85
empty-step rate (all-think, no code) 62 % 8 %
eval rollouts with reward > 0 3.3 % (4/120) 51.7 % (62/120)
best_partial_grabbed (avg) ~0 % ~70 %
best_partial_moved (avg) ~0 % ~60 %
best_partial_delivered (avg) ~0 % 1โ€“4 %
test/reward/total 0.0029 0.065 (peak)
held-out solves 0 rare (2 across iters 65 + 75)

The model learned to (1) suppress <think> blocks and emit Python code within max_tokens, (2) call safe_verify(...) and submit(...) each turn, and (3) write arm programs that successfully grab and move atoms on a majority of held-out puzzles. Full delivery / solve is still rare on the test split.

Files

  • adapter_model.safetensors โ€” Tinker raw LoRA adapter weights
  • adapter_config.json โ€” Tinker adapter metadata (rank, alpha, target modules)
  • README.md โ€” this file

Converting to PEFT format for vLLM / SGLang

The files above are in Tinker's raw adapter format. To convert to PEFT format suitable for direct vLLM --lora-modules loading, run on a machine that can host the base model:

from tinker_cookbook.weights import build_lora_adapter

build_lora_adapter(
    base_model="moonshotai/Kimi-K2.5",
    adapter_path="./tinker_adapter",   # this repo's contents
    output_path="./peft_adapter",
)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for GoodStartLabs/kimi-k2.5-opus-magnum-lora-step90

Adapter
(22)
this model