More detailed readme will come later, or maybe not.
This version and the next build off the original dataset for v0.5 (Castollux-Long) which has remained unchanged apart from now being the <DETAILED_CAPTION>
mode.
The changes try to add a shorter caption mode using captions generated by gemini-2.5-flash
on the same images, and a NSFW capable caption mode by distilling Minthy/ToriiGate-v0.4-7B using some of the same images as well as new NSFW images that Gemini would refuse. ToriiGate was chosen over JoyCaption because JoyCaption tends to hallucinate text and some other various nitpicks.
Another hope with using a dedicated NSFW captioning mode is that the model would learn NSFW concepts across ALL captioning modes, while preserving the original captioning style of v0.5.
Florence-2ner Training Config:
model_name: microsoft/Florence-2-base
wandb_project_name: Florence-2-base
run_name: Florence-2-base-Castollux-v0.6-run40
epochs: 2
optimizer: CAME
learning_rate: 0.0000025
min_learning_rate: 0.00000025
lr_scheduler: REX
freeze_vision: false
freeze_language: false
freeze_other: false
train_batch_size: 4
eval_batch_size: 4
gradient_accumulation_steps: 32
gradient_checkpointing: true
clip_grad_norm: 0.5
weight_decay: 0.01
save_total_limit: 3
save_steps: 10
eval_steps: 10
warmup_steps: 20
eval_split: 256
seed: 42
filtering_processes: 128
attn_implementation: sdpa
dataset_config:
<CAPTION>:
- "/media/xzuyn/NVMe/Datasets/Images/Castollux-Short"
<DETAILED_CAPTION>:
- "/media/xzuyn/Toshiba1/Datasets/Images/Castollux-Long"
<MORE_DETAILED_CAPTION>:
- "/media/xzuyn/NVMe/Datasets/Images/ToriiGate-v0.4-7B-Captioned-Images"
- Downloads last month
- 43
Model tree for PJMixers-Images/Florence-2-base-Castollux-v0.6
Base model
microsoft/Florence-2-base