how did you do it?
2
#4 opened 6 months ago
by
ehartford
compare to qwen3-8b and qwen3-14b
π
7
#3 opened 6 months ago
by
decem
Could the same distillation technology be used to create a draft model for DeepSeek R1 0528 ?
π
2
#2 opened 6 months ago
by
BernardH
Multilingual?
#1 opened 6 months ago
by
AaronFeng753