jrahn commited on
Commit
f30df6a
·
1 Parent(s): 416e854

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - jrahn/yolochess_lichess-elite_2211
5
+ library_name: transformers
6
+ tags:
7
+ - chess
8
+ ---
9
+ # Model Card for yolochess_mlm_azure-cloud-35
10
+
11
+ <!-- Provide a quick summary of what the model is/does. -->
12
+
13
+ This model with 66M parameters is pre-trained from scratch with Masked Language Modeling on Chess Positions in [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation) format.
14
+ It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves.
15
+
16
+ # Model Details
17
+
18
+ ## Model Description
19
+
20
+ <!-- Provide a longer summary of what this model is. -->
21
+
22
+
23
+
24
+ - **Developed by:** Jonathan Rahn
25
+ - **Model type:** Distilbert
26
+ - **Language(s) (NLP):** Chess [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation)
27
+ - **License:** MIT
28
+
29
+ # Uses
30
+
31
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
32
+
33
+ ## Direct Use
34
+
35
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
36
+
37
+ This model is pre-trained from scratch with Masked Language Modeling on Chess Positions in FEN format.
38
+
39
+ ## Downstream Use
40
+
41
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
42
+
43
+ It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves.
44
+
45
+ ## Out-of-Scope Use
46
+
47
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
48
+
49
+ Anything other than Chess Positions in standard [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation) format.
50
+
51
+ # Bias, Risks, and Limitations
52
+
53
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
54
+
55
+ n/a
56
+
57
+ ## Recommendations
58
+
59
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
60
+
61
+ n/a
62
+
63
+ ## How to Get Started with the Model
64
+
65
+ Use the code below to get started with the model.
66
+
67
+ ```python
68
+ from transformers import AutoModelForMaskedLM, AutoTokenizer
69
+ tokenizer = AutoTokenizer.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35")
70
+ model = AutoModelForMaskedLM.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35")
71
+ ```
72
+
73
+ # Training Details
74
+
75
+ ## Training Data
76
+
77
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
78
+
79
+ [Lichess-Elite 22-11 Dataset](https://huggingface.co/datasets/jrahn/yolochess_lichess-elite_2211)
80
+
81
+ ## Training Procedure
82
+
83
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
84
+
85
+ Masked Language Modeling objective with 15% masked token ratio.
86
+
87
+ ### Preprocessing
88
+
89
+ Tokenize `data["train"]["fen"]` with max-length padding to 200 tokens.
90
+
91
+ ### Speeds, Sizes, Times
92
+
93
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
94
+
95
+ Training for 172500 steps at batch-size 128 (22M examples, 1 epoch) took ~10 hrs on 1x RTX 4090, using 20GB VRAM.
96
+ It reached an MLM loss of 0.2567.
97
+
98
+ # Environmental Impact
99
+
100
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
101
+
102
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
103
+
104
+ - **Hardware Type:** 1x RTX 4090
105
+ - **Hours used:** 10
106
+ - **Cloud Provider:** local
107
+ - **Compute Region:** local
108
+ - **Carbon Emitted:** 1.5kg
109
+
110
+ # Technical Specifications
111
+
112
+ ## Model Architecture and Objective
113
+
114
+ Distilbert, Masked Language Modeling