nanoT5-base-65kBPE-v2
This is a "raw" pretrained model intended to be fine-tuned on downstream tasks
- SiLU/gated-SiLU activation
 - 25% mask rate during pretrain
 - 65k vocab size, adapted claude3 tokenizer
 
training code: https://github.com/pszemraj/nanoT5/tree/any-tokenizer
plots
more details are under checkpoints/
loss
gradients
weights
- Downloads last month
 - -
 
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	🙋
			
		Ask for provider support


