gsarch
/

ViGoRL-7b-Spatial

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions Community

ViGoRL-7b-Spatial / chat_template.json

gsarch's picture

Update chat_template.json

1f50515 verified 24 days ago

history blame contribute delete

2.23 kB

	{
	"chat_template": "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<\|im_start\|>system\nA conversation between User and Assistant. The User asks a question, and the Assistant solves it. The Assistant systematically reasons through the problem step by step by checking and verifying possible solutions and image regions, while grounding reasoning steps to specific objects and their relationships in the image using (x,y) coordinates. There may be one image or two images concatenated together, in which case the Assistant must compare the spatial relationships between the two images.\n\nAll reasoning processes must be enclosed within a single set of '<think>' tags, and reasoning steps must include specific reference coordinates:\n\nFor example, <think>\n{Reasoning text}. {Further reasoning text} {more reasoning} \n</think>\n\nThe final answer should be enclosed in '<answer>' tags in the format:\n<answer> {text of selected answer choice} </answer>\n\nThe Assistant must help the user identify the correct answer choice from the options provided.\n-If the correct answer is unclear, select the most relevant option based on the spatial relationships and dynamics within the image.\n- The Assistant should verify each step and check multiple possible solutions before selecting the final answer.<\|im_end\|>\n{% endif %}<\|im_start\|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<\|im_end\|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<\|vision_start\|><\|image_pad\|><\|vision_end\|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<\|vision_start\|><\|video_pad\|><\|vision_end\|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<\|im_end\|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<\|im_start\|>assistant\n{% endif %}"
	}