Yes! It has gone from 671B to 2.7T in slightly more than 2 weeks.
671B to 2.7T?!? I can barely run 235B models! (and that's with Q3 or Q2).
Regardless, as long as it performs well and does the job i suppose.
But i wonder if trimming the models or getting them to be more optimized in size vs performance shouldn't be a bigger push. Though i'm new to this scene so i could just be ignorant in how this is all done.
"Ultra" is the keywords here. We are going to release Pro and Mini models. Also because of DynaMoE architecture (publishing soon), you can run parts of the model on your setup. The downside is it isn't very well supported by many apps so everything will be using our code (until support improves).