PolyCoder uses GPT2 architecture, with BPE tokenizer trained on a random 5% subset of the data (all languages), and a context length of 2048. To study the effect of scaling of model size, the odel was trained in 3 different sizes.
Model |
# parameters |
GPT2 |
160M |
GPT2 |
400M |
GPT2 |
2.7B |
PolyCoder is currently being integrated in 🤗 transformers
. Meanwhile it can be loaded following the instructions in the original GitHub repo.