EgoNormia-Cosmos-Reason2-2B-v4-fullcot
Multi-task SFT fine-tune of nvidia/Cosmos-Reason2-2B on the EgoNormia social norm benchmark. This v4 run trains on action selection, justification selection, and sensibility identification, with full-length Gemini-distilled CoT traces added to the MCQ supervision.
Training
| Parameter | Value |
|---|---|
| Base model | nvidia/Cosmos-Reason2-2B (Qwen3-VL-2B) |
| Tasks | Action + Justification + Sensibility (multi-task) |
| Train samples | 4959 (1651/1653 per task, 3 tasks total) |
| Training file | data/egonormia_llava_cot_train.json |
| CoT style | Full CoT, Gemini-distilled, text-description grounded |
| CoT length | median ~64 words (range 32-97) |
| Epochs | 3 |
| Global batch | 64 (8 replicas x 8 per replica) |
| Learning rate | 1e-5 (cosine decay, 3% warmup) |
| Context length | 8192 |
| Video input | video_prev.mp4, 8 frames |
| Hardware | 8x A100-SXM4-80GB |
| Run dir | outputs/egonormia_sft/20260228065438/ |
| Best checkpoint | step_145 / 231 total steps |
Evaluation (200 verified test samples)
| Model | Action | Justification | Both | S-IoU |
|---|---|---|---|---|
| Zero-shot | 58.5% | 81.5% | 51.0% | 0.516 |
v3 best (step_175) |
78.0% | 97.0% | 77.0% | 0.664 |
v4 step_145 |
81.0% | 95.5% | 78.0% | 0.574 |
Robustness (option shuffle)
| Checkpoint | Action | S-IoU | Both | Delta Action | Delta S-IoU |
|---|---|---|---|---|---|
original step_145 |
81.0% | 0.574 | 78.0% | - | - |
| shuffled options | 63.0% | 0.477 | 60.0% | -18.0pt | -0.097 |
Paired sign test on action correctness:
- worse = 43
- better = 7
- tied = 150
- p (two-sided) = 2.1e-07
Notes
- v4 improves action accuracy and joint accuracy over v3, but loses substantial S-IoU and fails robustness checks under option shuffle.
- The CoT traces are distilled from textual descriptions rather than directly grounded in the video, which likely contributes to shortcut learning.
- This checkpoint is useful as the "full CoT" ablation, but it is not the preferred deployment variant.
Usage
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
model = Qwen3VLForConditionalGeneration.from_pretrained(
"robertzty/EgoNormia-Cosmos-Reason2-2B-v4-fullcot",
torch_dtype="bfloat16",
device_map="auto",
)
processor = AutoProcessor.from_pretrained("robertzty/EgoNormia-Cosmos-Reason2-2B-v4-fullcot")
- Downloads last month
- 22
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support