Generate embeddings of images
Transform text into a 768-dimension vector
Generate image embeddings from images