ROOK-CLF-9M
A 9M parameter chess move prediction model using a classification approach, reproducing Google DeepMind's "Grandmaster-Level Chess Without Search".
Model Details
Model Description
ROOK-CLF-9M reproduces one specific ablation from the appendix of Ruoss et al. 2024 "Grandmaster-Level Chess Without Search": the 9M parameter model configuration trained on behavior cloning (action prediction only).
What is Reproduced:
- 9M parameter decoder-only transformer (smallest size from the original paper)
- Behavior cloning objective (action prediction from state)
- Architecture: 8 layers, 8 heads, 256 embedding dimension
What is Different:
- Single training objective (behavior cloning only) vs. three objectives in the full paper
- Reduced compute/training steps compared to original
Overview:
- Developed by: Jonathan Rahn, Jenia Jitsev (LAION/JSC), Qi Sun (Tokyo Tech/Sakana AI)
- Reproduces: 9M parameter ablation from Ruoss et al. 2024 Appendix (arXiv:2402.04494)
- Model type: LlamaForSequenceClassification
- Language(s): Chess notation (FEN format)
- License: MIT
- Finetuned from: Trained from scratch
- Demo: Interactive Demo
- Repository: GitHub
- Paper: LAION Research Note
Model Architecture
- Parameters: 9M
- Layers: 8
- Attention Heads: 8
- Hidden Size: 256
- Context Length: 78 tokens
- Vocabulary: 32-character base; model embedding size padded to 128
Uses
Direct Use
The model can be used for:
- Chess move prediction from FEN positions
- Chess position analysis
- Educational chess applications
- Research on strategic reasoning in transformers
Out-of-Scope Use
The model is not suitable for:
- Tournament-level competitive play
- Real-time chess engines requiring deep search
- Analysis of chess variants or non-standard rules
Training Details
Training Data
- Primary dataset: ChessBench (GDM) 40M positions used for behavior cloning
- Labels: Best move per position (UCI). Top‑k candidates used for auxiliary evaluation
Training Procedure
Preprocessing
- FEN Standardization: Convert positions to standard FEN notation
- Fixed-Length Encoding: Pad/truncate to 77 characters
- Tokenization: Character-level tokenization + [CLS] token (78 total)
- Move Mapping: Convert UCI moves to classification labels (1968 classes)
Training Hyperparameters
- Framework: HuggingFace Transformers
- Hardware: 2x NVIDIA RTX 4090
- Learning Rate: 4e-4
- Batch Size: 1024
- Optimizer: AdamW
- Weight Decay: 0.01
- Warmup Steps: 500
Evaluation
Metrics
Reported in the LAION research note:
- Action accuracy (ChessBench 40M, 195k steps): 49%
- BIG-bench Checkmate-in-One: 57%
Benchmarks
- BIG-bench Checkmate-in-One: 57% (LAION note)
- GDM Searchless Chess (ChessBench 40M): 49% action accuracy (LAION note)
Technical Details
Tokenization
The model uses a custom tokenization scheme critical for proper inference:
Step 1: FEN Processing (77 characters fixed)
# Original FEN
fen = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
# Process FEN to fixed 77-character format:
# 1. Expand numbers to dots (e.g., "8" → "........")
# 2. Remove slashes
# 3. Pad castling to 4 chars, en passant to 2 chars, halfmove to 3 chars, fullmove to 3 chars
def process_fen(fen):
position, turn, castling, en_passant, halfmove, fullmove = fen.split(" ")
# Expand empty squares: "8" → "........"
position = re.sub(r'\d+', lambda m: "." * int(m.group()), position)
position = position.replace("/", "") # Remove row separators
castling = castling.ljust(4, ".") # Pad to 4 chars
en_passant = en_passant.ljust(2, ".") # Pad to 2 chars
halfmove = halfmove.ljust(2, ".") + "." # Pad to 3 chars total
fullmove = fullmove.ljust(3, ".") # Pad to 3 chars
return "".join([position, turn, castling, en_passant, halfmove, fullmove])
# Result: exactly 77 characters
processed = process_fen(fen)
# "rnbqkbnrpppppppp................................PPPPPPPPRNBQKBNRwKQkq-...0..1.."
Step 2: Add [CLS] token and convert to token IDs
# Add classification token
final_input = processed + "[CLS]" # 78 characters total
# Convert to token IDs (character-level tokenization)
tokens = [char_to_id[c] for c in final_input] # 78 tokens
Complete example:
Input FEN: "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
Processed: "rnbqkbnrpppppppp................................PPPPPPPPRNBQKBNRwKQkq-...0..1.."
With [CLS]: "rnbqkbnrpppppppp................................PPPPPPPPRNBQKBNRwKQkq-...0..1..[CLS]"
Token IDs: [13, 11, 3, 12, 10, 3, 11, 13, 15, 15, 15, 15, 15, 15, 15, 15, ...] # 78 tokens
Inference
For in-browser inference, the model is exported to ONNX format:
# ONNX export for web deployment
import torch
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("jrahn/ROOK-CLF-9m")
dummy_input = torch.randint(0, 88, (1, 78))
torch.onnx.export(
model,
dummy_input,
"rook_clf_9m.onnx",
input_names=['input_ids'],
output_names=['logits'],
dynamic_axes={'input_ids': {0: 'batch_size'}}
)
Limitations
- Search-free: No lookahead or position evaluation beyond single move
- Tactical Weakness: Limited performance on complex tactical sequences
- Opening Knowledge: Relies on training data distribution for openings
- Endgame Performance: Weaker in theoretical endgames requiring precise calculation
Citation
If you use this model, please cite both our work and the original paper:
@article{rook2024,
title={ROOK: Strategic Reasoning in Chess Without Search},
author={Rahn, Jonathan and Jitsev, Jenia and Sun, Qi},
journal={LAION Research Notes},
year={2024},
url={https://laion.ai/notes/rook/}
}
@article{ruoss2024grandmaster,
title={Grandmaster-level chess without search},
author={Ruoss, Anian and Delétang, Grégoire and McAleese, Nell and Genewein, Tim and Weidinger, Laura and Cai, Matteo and Weber, Théophane and Hutter, Marcus and Legg, Shane},
journal={arXiv preprint arXiv:2402.04494},
year={2024}
}
Model Card Contact
Jonathan Rahn - GitHub | Research Page
Metrics Source
LAION research note: https://laion.ai/notes/rook/
- Downloads last month
- 6