Speculative decoding?
#6
by
charlesvanhouten
- opened
You mention speculative decoding for a great inference speed up but doesn't this require a smaller parameter count version of the same model to function? Is there a speculative decoding model available for public download? I didn't see one when I looked. I tried the base Qwen2 speculative decoding model with no luck.