zhixuan-lin/fox-pro-760m-longcrawl64-48b
			Text Generation
			• 
		
				0.8B
			• 
	
				Updated
					
				
				• 
					
					79
				
	
				
				
Checkpoints for the main experiments in "Forgetting Transformer: Softmax Attention with a Forget Gate" (https://arxiv.org/abs/2503.02130).