Demo for multimodal understanding and generation
Evaluate open-ended outputs from AI models using MM-Vet