FlameF0X/i3-200m
Updated
•
5
Note Work in progress.
Note SOTA model. Pre-trained in around 2 to 4 hours, in comparison with the previous version of over 14 hours. --- Changes --- Trained on over 3T tokens Other stuff available to read in the model card.
Note Smol stable text generator that took over 14 hours to pre-train :) --- Changes --- Trained on over 1T tokens LoRPt layers
Note Our first usable i3 model (meaning that we added Transformers support and some code for it)
Note the first i3 architecture LM