Speculative decoding?

#6
by charlesvanhouten - opened

You mention speculative decoding for a great inference speed up but doesn't this require a smaller parameter count version of the same model to function? Is there a speculative decoding model available for public download? I didn't see one when I looked. I tried the base Qwen2 speculative decoding model with no luck.

Sign up or log in to comment