jrahn
/

yolochess_mlm_azure-cloud-35

+---
+license: mit
+datasets:
+- jrahn/yolochess_lichess-elite_2211
+library_name: transformers
+tags:
+- chess
+---
+# Model Card for yolochess_mlm_azure-cloud-35
+<!-- Provide a quick summary of what the model is/does. -->
+This model with 66M parameters is pre-trained from scratch with Masked Language Modeling on Chess Positions in [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation) format.
+It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves.
+# Model Details
+## Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** Jonathan Rahn
+- **Model type:** Distilbert
+- **Language(s) (NLP):** Chess [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation)
+- **License:** MIT
+# Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+## Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+This model is pre-trained from scratch with Masked Language Modeling on Chess Positions in FEN format.
+## Downstream Use
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves.
+## Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+Anything other than Chess Positions in standard [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation) format.
+# Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+n/a
+## Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+n/a
+## How to Get Started with the Model
+Use the code below to get started with the model.
+```python
+from transformers import AutoModelForMaskedLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35")
+model = AutoModelForMaskedLM.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35")
+```
+# Training Details
+## Training Data
+<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[Lichess-Elite 22-11 Dataset](https://huggingface.co/datasets/jrahn/yolochess_lichess-elite_2211)
+## Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+Masked Language Modeling objective with 15% masked token ratio.
+### Preprocessing
+Tokenize `data["train"]["fen"]` with max-length padding to 200 tokens.
+### Speeds, Sizes, Times
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+Training for 172500 steps at batch-size 128 (22M examples, 1 epoch) took ~10 hrs on 1x RTX 4090, using 20GB VRAM.
+It reached an MLM loss of 0.2567.
+# Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** 1x RTX 4090
+- **Hours used:** 10
+- **Cloud Provider:** local
+- **Compute Region:** local
+- **Carbon Emitted:** 1.5kg
+# Technical Specifications
+## Model Architecture and Objective
+Distilbert, Masked Language Modeling