Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
zhixuan-lin 's Collections
Forgetting Transformer Paper Checkpoints

Forgetting Transformer Paper Checkpoints

updated Mar 12

Checkpoints for the main experiments in "Forgetting Transformer: Softmax Attention with a Forget Gate" (https://arxiv.org/abs/2503.02130).

Upvote
-

  • zhixuan-lin/fox-pro-760m-longcrawl64-48b

    Text Generation • 0.8B • Updated Aug 11 • 79

  • zhixuan-lin/transformer-pro-760m-longcrawl64-48b

    Text Generation • 0.8B • Updated Aug 11 • 16

  • zhixuan-lin/fox-llama-760m-longcrawl64-48b

    Text Generation • 0.8B • Updated Aug 11 • 3

  • zhixuan-lin/transformer-llama-760m-longcrawl64-48b

    Text Generation • 0.8B • Updated Aug 11 • 4

  • zhixuan-lin/delta_net-760m-longcrawl64-48b

    Text Generation • 0.8B • Updated Aug 11 • 4

  • zhixuan-lin/mamba2-760m-longcrawl64-48b

    Text Generation • 0.9B • Updated Aug 11 • 3

  • zhixuan-lin/hgrn2-760m-longcrawl64-48b

    Text Generation • 0.8B • Updated Aug 11 • 4

  • zhixuan-lin/longcrawl64-json-gpt2-tokenizer

    Updated Mar 11
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs