zhixuan-lin/fox-pro-760m-longcrawl64-48b
Text Generation
•
0.8B
•
Updated
•
79
Checkpoints for the main experiments in "Forgetting Transformer: Softmax Attention with a Forget Gate" (https://arxiv.org/abs/2503.02130).