nirajandhakal
/

StockZero-v2

Reinforcement Learning

Model card Files Files and versions

nirajandhakal commited on Mar 24

Commit

c8b4e9c

·

verified ·

1 Parent(s): e344866

Update README.md

Files changed (1) hide show

README.md +13 -1

README.md CHANGED Viewed

@@ -100,6 +100,17 @@ The model is intended for research, experimentation, and education purposes. Pot
 *   The training data is limited to a small number of self-play games (50 games), therefore the strength of the engine is limited.
 *    The model is trained on a single GPU, so longer training may require multi GPU support or longer runtime.
 ## How to Use
 ### Training
@@ -202,4 +213,5 @@ if np.sum(masked_policy_probs) > 0:
     masked_policy_probs /= np.sum(masked_policy_probs)
 print("Policy Output:", masked_policy_probs)
-print("Value Output:", value_output)

 *   The training data is limited to a small number of self-play games (50 games), therefore the strength of the engine is limited.
 *    The model is trained on a single GPU, so longer training may require multi GPU support or longer runtime.
+## Model Evaluation
+This model was evaluated against a simple random move opponent using the `evaluate_model` method in the provided `evaluation_script.py`. The results are as follows:
+*   **Number of Games:** 200 (The model plays as both white and black in each game against the random agent.)
+*   **Win Rate:** 0.0150 (1.5%)
+*   **Draw Rate:** 0.6850 (68.5%)
+*   **Loss Rate:** 0.3000 (30%)
+These scores indicate that the model, in its current state, is not a strong chess player. It draws a majority of games against a random opponent, but also loses a significant number. Further training and architecture improvements are needed to enhance its performance.
 ## How to Use
 ### Training
     masked_policy_probs /= np.sum(masked_policy_probs)
 print("Policy Output:", masked_policy_probs)
+print("Value Output:", value_output)
+```