RichardForests
			's Collections
			 
		
			
		Transformers & MoE
		
	updated
			
 
				
				
	
	
	
			
			SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
		
			Paper
			
•
			2312.07987
			
•
			Published
				
			•
				
				41
			
 
	
	 
	
	
	
			
			Interfacing Foundation Models' Embeddings
		
			Paper
			
•
			2312.07532
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Point Transformer V3: Simpler, Faster, Stronger
		
			Paper
			
•
			2312.10035
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
				TheBloke/quantum-v0.01-GPTQ
				
				
			
			Text Generation
			
• 
		
				1B
			• 
	
				Updated
					
				
				• 
					
					18
				
	
				
• 
					
					2
				
 
		
	
	
	 
	
	
	
				TheBloke/PiVoT-MoE-GPTQ
				
				
			
			Text Generation
			
• 
		
				5B
			• 
	
				Updated
					
				
				
				
	
				• 
					
					1
				
 
		
	
	
	 
	
	
	
				mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-HQQ
				
				
			
			Text Generation
			
• 
		
	
				Updated
					
				
				• 
					
					30
				
	
				
• 
					
					38
				
 
		
	
	
	 
	
	
	
			
			Denoising Vision Transformers
		
			Paper
			
•
			2401.02957
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			DeepSeekMoE: Towards Ultimate Expert Specialization in
  Mixture-of-Experts Language Models
		
			Paper
			
•
			2401.06066
			
•
			Published
				
			•
				
				56
			
 
	
	 
	
	
	
			
			Buffer Overflow in Mixture of Experts
		
			Paper
			
•
			2402.05526
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			Beyond Scaling Laws: Understanding Transformer Performance with
  Associative Memory
		
			Paper
			
•
			2405.08707
			
•
			Published
				
			•
				
				34