π§ Quasar-V4-Tiny (Post-trained)
Model ID: silx-ai/Quasar-V4-Tiny-Post
Architecture: Linear Attention with Kernel Feature Maps
Developed by: SILX AI
Powered by: gputrader.io
π Description
This is the post-trained version of Quasar-V4-Tiny, an experimental model that uses Linear Attention with Kernel Feature Maps.
The model architecture is under development to explore efficient attention mechanisms as an alternative to standard transformers.
This version is trained using the SmolTalk dataset on a very small batch size and few optimization steps.
π§ The purpose of this checkpoint is not to generate high-quality or accurate outputs.
β
It is intended only to validate that the Quasar-V4 architecture works end-to-end (pretraining β finetuning β inference).
π Training Details
- Base training tokens: ~1β2 billion tokens
- Post-training dataset: SmolTalk
- Batch size: Very small (experimental)
- Steps: Minimal, only for architecture testing
β οΈ Limitations
- Not suitable for production or research use.
- Outputs are likely to be low-quality or inconsistent.
- This checkpoint is primarily for internal debugging and architecture validation.
π Acknowledgements
Special thanks to gputrader.io for providing the compute resources that made this experiment possible.
π¬ Future Work
We plan to scale up the architecture, pretrain on larger datasets, and benchmark the model for meaningful downstream tasks once the design is validated.
Stay tuned.
- Downloads last month
- 1
Model tree for silx-ai/Quasar-V4-Tiny-Post
Base model
silx-ai/QuasarV4-Tiny