manycore-research/SpatialLM-Llama-1B · Why tagged as 'Text Generation'?

Hello, cool project, i just wondered why this model was tagged as text generation because the description says it's purpose is solely for 3D/Vision based use cases.

If it doesn't output any plain text, I would recommend using a tag in the 'Computer Vision' category, perhaps 'Mask Generation', as the video demo shows Video Segmentation and Masking of sorts.

'Text Generation' implies that has the ability to output plain text, which is not cited as a feature in the model card.

Anyway, Great work!! Just wanted to understand the reasoning, i am new to this area of research, especially vision/multimodal models, so i may have misunderstood.

Many thanks,
James Clarke