EgoNormia-Cosmos-Reason2-2B-v3

Multi-task SFT fine-tune of nvidia/Cosmos-Reason2-2B on the EgoNormia social norm benchmark. Trained jointly on action selection, justification selection, and sensibility identification.

Training

Parameter	Value
Base model	nvidia/Cosmos-Reason2-2B (Qwen3-VL-2B)
Tasks	Action + Justification + Sensibility (multi-task)
Train samples	4959 (1653 per task x 3)
Epochs	3
Global batch	64 (8 replicas x 8 per replica)
Learning rate	1e-5 (cosine decay, 3% warmup)
Video input	`video_prev.mp4`, 8 frames
Hardware	8x A100-SXM4-80GB
Best checkpoint	`step_150` / 231 total steps

Evaluation (200 verified test samples)

Model	Action	Justification	Both	S-IoU	Parse(A/J/S)
Zero-shot	58.5%	81.5%	51.0%	0.516	-/-/-
v2 best	82.0%	84.0%	71.5%	0.0%*	100/100/0%
v3 `step_150`	79.5%	96.5%	77.0%	0.630	100/100/100%

*v2 S-IoU = 0% because the model collapses on the sensibility output format.

Notes

v3 is the first version that fully solves the output-format collapse from v2.
Relative to v2, it trades a small amount of peak action accuracy for large gains in justification quality, sensibility parsing, and overall benchmark completeness.
Later versions mainly explore how to improve v3's action / robustness tradeoff without breaking formatting.

Usage

from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

model = Qwen3VLForConditionalGeneration.from_pretrained(
    "robertzty/EgoNormia-Cosmos-Reason2-2B-v3",
    torch_dtype="bfloat16",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("robertzty/EgoNormia-Cosmos-Reason2-2B-v3")

Downloads last month: 23

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for robertzty/EgoNormia-Cosmos-Reason2-2B-v3

Base model

Qwen/Qwen3-VL-2B-Instruct

Finetuned

nvidia/Cosmos-Reason2-2B

Finetuned

(9)

this model