EgoNormia-Cosmos-Reason2-2B-v4-fullcot

Multi-task SFT fine-tune of nvidia/Cosmos-Reason2-2B on the EgoNormia social norm benchmark. This v4 run trains on action selection, justification selection, and sensibility identification, with full-length Gemini-distilled CoT traces added to the MCQ supervision.

Training

Parameter	Value
Base model	nvidia/Cosmos-Reason2-2B (Qwen3-VL-2B)
Tasks	Action + Justification + Sensibility (multi-task)
Train samples	4959 (1651/1653 per task, 3 tasks total)
Training file	`data/egonormia_llava_cot_train.json`
CoT style	Full CoT, Gemini-distilled, text-description grounded
CoT length	median ~64 words (range 32-97)
Epochs	3
Global batch	64 (8 replicas x 8 per replica)
Learning rate	1e-5 (cosine decay, 3% warmup)
Context length	8192
Video input	`video_prev.mp4`, 8 frames
Hardware	8x A100-SXM4-80GB
Run dir	`outputs/egonormia_sft/20260228065438/`
Best checkpoint	`step_145` / 231 total steps

Evaluation (200 verified test samples)

Model	Action	Justification	Both	S-IoU
Zero-shot	58.5%	81.5%	51.0%	0.516
v3 best (`step_175`)	78.0%	97.0%	77.0%	0.664
v4 `step_145`	81.0%	95.5%	78.0%	0.574

Robustness (option shuffle)

Checkpoint	Action	S-IoU	Both	Delta Action	Delta S-IoU
original `step_145`	81.0%	0.574	78.0%	-	-
shuffled options	63.0%	0.477	60.0%	-18.0pt	-0.097

Paired sign test on action correctness:

worse = 43
better = 7
tied = 150
p (two-sided) = 2.1e-07

Notes

v4 improves action accuracy and joint accuracy over v3, but loses substantial S-IoU and fails robustness checks under option shuffle.
The CoT traces are distilled from textual descriptions rather than directly grounded in the video, which likely contributes to shortcut learning.
This checkpoint is useful as the "full CoT" ablation, but it is not the preferred deployment variant.

Usage

from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

model = Qwen3VLForConditionalGeneration.from_pretrained(
    "robertzty/EgoNormia-Cosmos-Reason2-2B-v4-fullcot",
    torch_dtype="bfloat16",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("robertzty/EgoNormia-Cosmos-Reason2-2B-v4-fullcot")

Downloads last month: 22

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for robertzty/EgoNormia-Cosmos-Reason2-2B-v4-fullcot

Base model

Qwen/Qwen3-VL-2B-Instruct

Finetuned

nvidia/Cosmos-Reason2-2B

Finetuned

(9)

this model