โš ๏ธ IMPORTANT WARNING โ€” Model Effectiveness

These bootstrap models are part of a degeneracy chain. The ICI-DC bootstrap S1 was trained on synthetic data generated by a model (S2-coeff1.5) that itself had underfit during training. The bootstrap S2 was then fine-tuned on that degraded S1. Each generation of fine-tuning further degraded the base model's innate mutation discrimination capability (base Omni-DNA-20M achieves 0.951 AUC raw DNA; these models achieve ~0.30 AUC).

The SAD coefficient reported for the bootstrap S2 (~12.03 LR-adjusted) is a mathematical artifact of the training configuration, not an indicator of genuine training convergence.

These models are preserved for historical and reproducibility purposes only.


Omni-DNA SAD Bootstrap Checkpoint

Omni-DNA-20M fine-tuned via ICI-DC โ†’ SAD pipeline using bootstrap synthetic data.

Training Details

  • Base model: Nhoodie/omni-dna-ici-dc-bootstrap (ICI-DC pre-trained on bootstrap synthetic data)
  • Training data: 3,317 real mutation pairs (SAD attenuation)
  • Best checkpoint: Epoch 3.37, eval loss 0.308
  • Hyperparameters: LR=1e-5, epochs=5, batch_size=16, grad_accum=2, cosine schedule
  • LR-adjusted SAD coefficient: 12.03

Benchmark (826 test pairs)

Axis Metric Score
Mutation Detection F1 0.660
Embedding Distance Seq AUC 0.358
Masked Prediction Surprise ฮ” โˆ’1.79
Discriminative AUC 0.304

Related Models

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Nhoodie/omni-dna-sad-mutation-bootstrap

Finetuned
(4)
this model