EgoNormia-Cosmos-Reason2-2B-v3

Multi-task SFT fine-tune of nvidia/Cosmos-Reason2-2B on the EgoNormia social norm benchmark. Trained jointly on action selection, justification selection, and sensibility identification.

Training

Parameter Value
Base model nvidia/Cosmos-Reason2-2B (Qwen3-VL-2B)
Tasks Action + Justification + Sensibility (multi-task)
Train samples 4959 (1653 per task x 3)
Epochs 3
Global batch 64 (8 replicas x 8 per replica)
Learning rate 1e-5 (cosine decay, 3% warmup)
Video input video_prev.mp4, 8 frames
Hardware 8x A100-SXM4-80GB
Best checkpoint step_150 / 231 total steps

Evaluation (200 verified test samples)

Model Action Justification Both S-IoU Parse(A/J/S)
Zero-shot 58.5% 81.5% 51.0% 0.516 -/-/-
v2 best 82.0% 84.0% 71.5% 0.0%* 100/100/0%
v3 step_150 79.5% 96.5% 77.0% 0.630 100/100/100%

*v2 S-IoU = 0% because the model collapses on the sensibility output format.

Notes

  • v3 is the first version that fully solves the output-format collapse from v2.
  • Relative to v2, it trades a small amount of peak action accuracy for large gains in justification quality, sensibility parsing, and overall benchmark completeness.
  • Later versions mainly explore how to improve v3's action / robustness tradeoff without breaking formatting.

Usage

from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

model = Qwen3VLForConditionalGeneration.from_pretrained(
    "robertzty/EgoNormia-Cosmos-Reason2-2B-v3",
    torch_dtype="bfloat16",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("robertzty/EgoNormia-Cosmos-Reason2-2B-v3")
Downloads last month
23
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for robertzty/EgoNormia-Cosmos-Reason2-2B-v3

Finetuned
(9)
this model