🧠 Quasar-V4-Tiny (Post-trained)

Model ID: silx-ai/Quasar-V4-Tiny-Post
Architecture: Linear Attention with Kernel Feature Maps
Developed by: SILX AI
Powered by: gputrader.io

📝 Description

This is the post-trained version of Quasar-V4-Tiny, an experimental model that uses Linear Attention with Kernel Feature Maps.
The model architecture is under development to explore efficient attention mechanisms as an alternative to standard transformers.

This version is trained using the SmolTalk dataset on a very small batch size and few optimization steps.

🚧 The purpose of this checkpoint is not to generate high-quality or accurate outputs.
✅ It is intended only to validate that the Quasar-V4 architecture works end-to-end (pretraining → finetuning → inference).

📊 Training Details

Base training tokens: ~1–2 billion tokens
Post-training dataset: SmolTalk
Batch size: Very small (experimental)
Steps: Minimal, only for architecture testing

⚠️ Limitations

Not suitable for production or research use.
Outputs are likely to be low-quality or inconsistent.
This checkpoint is primarily for internal debugging and architecture validation.

🙏 Acknowledgements

Special thanks to gputrader.io for providing the compute resources that made this experiment possible.

🔬 Future Work

We plan to scale up the architecture, pretrain on larger datasets, and benchmark the model for meaningful downstream tasks once the design is validated.

Stay tuned.

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for silx-ai/Quasar-V4-Tiny-Post

Base model

silx-ai/QuasarV4-Tiny

Finetuned

(1)

this model

silx-ai
/

Quasar-V4-Tiny-Post