DUMMY WEIGHTS ONLY !! NOT REAL MODEL !!

Specifications (Will be carried out to the main fully released model)

Architecture: arlow (Not supported by transformers right now, but beta version is out -> here)

Exact location: here

Will feature:

  • GQA
  • Silu
  • Flash Attention VarLen + manual QKV proj
  • cross attention (untrained, there for easy vision encoder incorporation)
  • model is decoder only, however, cross attention weights is there.
  • Custom RoPE
  • and more!

Arlow architecture isn't officially supported by transformers but my implementation will be there if you want to try it.

Downloads last month
5
Safetensors
Model size
368M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train yuchenxie/arlowgpt-dummy-weights

Collection including yuchenxie/arlowgpt-dummy-weights