Ertugrul/Qwen2-VL-7B-Captioner-Relaxed · Release of 2B version or dataset?

I've searched heavily and found this model is the best for my goal of captioning images on a low power device (A Nanopi-m6 with a CM3588 chip). It's pretty great allowing me to get accurate captions at 7w, except it's simply too large to be used all on the NPU of the rk3588 meaning I'm having to split the vision module across to the CPU which means it takes ~45 seconds to get a single image captioned (even using a queue system with separate threads). I'd love this model in a 2b version which would allow me a much faster captioner. Any chance you'll train up a smaller (2b would be ideal) version or alternatively share the dataset and training code you used to train up my own?

Thanks!