nirajandhakal
/

StockZero-v2

Reinforcement Learning

Model card Files Files and versions

nirajandhakal commited on Mar 24

Commit

19e2192

·

verified ·

1 Parent(s): c8b4e9c

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -28,6 +28,10 @@ StockZero learns to play chess by playing against itself. The core component is
 The model is trained using self-play data generated through MCTS, which guides the engine to explore promising game states.
 ### Input
 The model takes a chess board as input, represented as a 8x8x12 NumPy array. Each of the 12 channels in the input represent a specific piece type (Pawn, Knight, Bishop, Rook, Queen, King) for both white and black players, where each layer contains binary values.

 The model is trained using self-play data generated through MCTS, which guides the engine to explore promising game states.
+This model card is for StockZero version 2 (v2) model. While the v1 model has same architecture, it had less self-play to learn policy. V1 model was played on only 20 self-play policy training for testing purposes to see whether the model will converge to lower value while v2 was played on 50 self-play games during policy training on Google Colaboratory Free Tier Notebook because larger self-play would result in high compute demand which is what I currently can't afford.
+**Note**: StockZero v3 will be trained and open sourced soon.
 ### Input
 The model takes a chess board as input, represented as a 8x8x12 NumPy array. Each of the 12 channels in the input represent a specific piece type (Pawn, Knight, Bishop, Rook, Queen, King) for both white and black players, where each layer contains binary values.