Cofiber Detection
Object detection heads built on cofiber decomposition of frozen EUPE-ViT-B features. The cofiber decomposition produces multi-scale representations with zero learned parameters, replacing the 11M-parameter FPN typically used in FCOS-style detectors. Heads range from 70-parameter analytical constructions to 3.85M-parameter trained networks, evaluated on COCO val2017.
The Cofiber Decomposition
Given spatial backbone features f : [768, H, W], the cofiber decomposition produces n scale bands via iterated subtraction of downsampled-then-upsampled content:
residual = f
for k = 0 to n-2:
omega_k = avgpool(residual, 2)
sigma_omega_k = upsample_bilinear(omega_k, size=residual.shape)
cofiber_k = residual - sigma_omega_k
residual = omega_k
cofiber_{n-1} = residual
Each cofiber_k captures frequency content at a distinct scale with no cross-scale interference. The decomposition is a fixed two-line operation, yet it provides the same multi-scale structure that an FPN synthesizes with 11M trained parameters.
The construction is machine-checked in Rocq/HoTT (CofiberDecomposition.v). The proof frames average pooling and bilinear upsampling as an adjoint pair whose counit gives a short exact sequence in a semi-additive category; the cofiber bands are the kernels of the projections, and the sum is exact by construction.
Best Results (COCO val2017)
| Variant | Params | mAP | mAP@0.50 | mAP@0.75 | Category |
|---|---|---|---|---|---|
| split_tower_5scale_160h_5std_4dw_ema_l14_16ep_768_cls_calib | 2,975,067 | 42.64 | 65.70 | 45.10 | trained |
| split_tower_5scale_160h_5std_4dw_ema_l14_16ep_768 (step 104k, pre-calib) | 2,975,067 | 42.49 | 65.57 | 44.89 | trained |
| split_tower_5scale_160h_5std_4dw_ema_l14_16ep (step 100k, 640px) | 2,975,067 | 41.15 | 63.99 | 43.83 | trained |
| split_tower_5scale_192h_5std_4dw_textaligned_640px | 4,164,699 | 25.95 | 39.13 | 28.92 | trained |
| split_tower_5scale_192h_5std_4dw | 4,068,954 | 24.6 | 37.1 | 27.0 | trained |
| split_tower_192h_5std_4dw | 4,016,441 | 20.7 | 28.5 | 22.8 | trained |
| split_tower_224h_3std_6dw | 3,849,657 | 20.3 | 28.1 | 22.3 | trained |
| conv_deep_p3_lateral | 4,269,785 | 19.9 | 28.4 | 22.0 | trained |
| conv_deep_p3 | 3,972,569 | 19.7 | 28.3 | 21.6 | trained |
| conv_deep_3.38M | 3,381,592 | 18.8 | 27.4 | 20.9 | trained |
| conv_deep_912k | 911,960 | 17.2 | 25.6 | 19.2 | trained |
| evolved_deep | 182,580 | 10.6 | 18.9 | 10.8 | trained |
| spatialreg_92k | 91,960 | 8.2 | 25.7 | 2.8 | trained |
| box32_92k | 91,640 | 5.9 | 21.4 | 1.3 | trained |
| box32 pruned R2 | ~62,000 nz | 5.9 | 20.4 | 1.5 | trained |
| dim20 | 22,076 | 3.9 | 14.8 | 0.9 | trained |
| analytical_70k | 69,976 | 1.6 | 6.0 | 0.4 | analytical |
| evolved K=100 person | 105 | 1.3 | 5.8 | 0.1 | circuit |
| Baseline FCOS (non-cofiber) | 16,138,074 | 41.0 | 64.8 | 43.2 | reference |
The current split-tower head reaches 42.64 mAP on COCO val2017 with 2,975,067 learnable parameters (42.71 under soft NMS), passing the 16.14M FCOS baseline (41.0 mAP) by +1.64 while using 18.4 percent of its head parameter budget. Small-object mAP is 22.3 (FCOS 19.4, +2.9). This is the head shipped in phanerozoic/argus. The architecture consists of separate classification and regression convolutional towers, each built from five standard 3Γ3 convolutions followed by four depthwise residual blocks at a hidden dimension of 160 channels. Its input is a cofiber decomposition of the backbone patch features into four frequency-separated bands (corresponding to spatial strides of 16, 32, 64, and 128 pixels), with an additional finer-resolution level (stride 8) synthesized by a single transposed convolution from the stride-16 band. Top-down lateral connections pass information from coarser bands into finer ones before the towers run. The five resulting prediction levels (strides 8, 16, 32, 64, and 128) match the scale coverage of the FCOS simple feature pyramid while obtaining their multi-scale structure from a zero-parameter decomposition rather than from a learned feature pyramid network.
The path from 24.6 mAP to 42.64 mAP was recipe-and-resolution, not architectural. Hidden width actually went down (192 β 160); the gains came from ATSS target assignment (replacing FCOS center sampling), horizontal flip augmentation, a CLIP ViT-L/14 8-prompt-average text embedding in place of ViT-B/32 single-prompt, PC-initialized cls_project (first 80 columns from the SVD of the text embedding, remaining columns a random orthogonal complement), exponential moving average during training (decay 0.9998), a doubled 16-epoch schedule with late-training checkpoint selection, the training-resolution jump from 640-pixel to 768-pixel input (+1.34 mAP, mostly on small-object AP via the 48Γ48 backbone grid vs 40Γ40), and a 3-epoch partial fine-tune updating only the classification calibration layers (cls_project, cls_bias, logit_scale) at lr 1e-4 with towers and cofiber path frozen (+0.15 mAP). The final checkpoint is the end state of the partial fine-tune; eval JSON sits next to the weights in the shipping directory.
Repository Structure
`analytical/`
| Path | Description |
|---|---|
analytical_70k/ |
Closed-form least-squares head. 70K params, 1.6 mAP, zero training |
analytical_h1/ |
Sheaf cohomology (H^1) features. Experimental |
variants/ |
Exotic feature experiments (quadratic, RFF, Fourier, fractal) with result JSONs |
scripts/ |
analytical_greedy_gpu.py, analytical_exotic_gpu.py, analytical_empbayes.py, etc. |
`trained/`
| Path | Params | mAP | Description |
|---|---|---|---|
split_tower_5scale/ |
4.07M | 24.6 | Five-scale split-tower head (P3-P7). Current best |
split_tower/ |
4.02M | 20.7 | Four-scale split-tower head (predecessor) |
conv_deep/ |
912K-4.27M | 17.2-19.9 | Depthwise residual stack variants (scaled, P3, lateral) |
evolved_deep/ |
182K | 10.6 | 10-layer MLP on 92 evolutionarily-selected dims |
spatialreg_92k/ |
92K | 8.2 | 3x3 depthwise conv on regression output |
linear_70k/ |
70K | 5.2 | Trained linear classifier |
box32_92k/ |
92K | 5.9 | INT8 threshold logic circuit + pruned variants (46K-76K) |
box32_distilled/ |
92K | β | Self-distillation of box32 |
dim_sweeps/ |
9K-80K | 0.3-? | SVD-initialized fixed-dim heads (5, 10, 15, 20, 30, 80) |
sloe/ |
β | 0.0 | Spectral Laplacian object emergence (failed experiment) |
person_specialist/ |
9K | β | Person-only detector |
waldo_specialist/ |
5K | β | Waldo-finding detector |
experimental_scaffolds/ |
β | β | Untrained architectural scaffolds (5scale, adaptive, centernet, linear) |
`circuit/`
| File | Description |
|---|---|
person_analytical.pth |
Person classifier at 93 parameters, 99.8% recall |
person_detector.sv, cofiber_detector.sv |
Verilog implementations |
rom/*.hex |
INT8 weight ROMs |
evolved_K100_person_eval.json |
Evolutionary search result, 105 params, 1.3 mAP |
tb_person.sv |
Testbench |
`scripts/`
| Script | Target |
|---|---|
train_split_tower.py |
Split tower (best) |
train_conv_deep.py |
Conv deep family (912K-4.27M) |
train_evolved_deep.py |
Evolved deep on 92 dims |
eval_conv_deep_step.py |
Eval any conv_deep checkpoint |
eval_evolved_deep.py |
Eval evolved_deep checkpoint |
eval_coco_map.py |
Generic COCO mAP eval |
`CofiberDecomposition.v`
Rocq/HoTT machine-checked proof that the cofiber decomposition is exact in a semi-additive category: every input decomposes uniquely as a sum of scale bands with zero cross-term residual.
Scaling Curve
The relationship between head parameters and mAP is approximately logarithmic across four orders of magnitude, until the CLIP-text-aligned 160h recipe lands above FCOS:
105 params β 1.3 mAP (evolved circuit, person only)
70K params β 1.6 mAP (analytical closed-form)
92K params β 8.2 mAP (depthwise conv on regression)
182K params β 10.6 mAP (evolved dim selection + 10-layer MLP)
912K params β 17.2 mAP (depthwise conv stack)
3.97M params β 19.7 mAP (with stride-8 P3)
3.85M params β 20.3 mAP (split cls/reg towers, 3 std + 6 dw at 224 hidden, 4 scales)
4.02M params β 20.7 mAP (split cls/reg towers, 5 std + 4 dw at 192 hidden, 4 scales)
4.07M params β 24.6 mAP (same tower, 5 scales / P3-P7 coverage)
4.16M params β 25.95 mAP (+ CLIP ViT-B/32 text-aligned classifier, 8 ep, 640px)
2.98M params β 41.15 mAP (160h + ATSS + EMA + PC init + CLIP ViT-L/14, 16 ep, 640px)
2.98M params β 42.49 mAP (same recipe, 768px training input)
2.98M params β 42.64 mAP (+ 3-epoch cls_calib fine-tune β shipped in Argus)
16.14M params β 41.0 mAP (FCOS baseline with FPN, 640px)
The bottom four rows are the point of the repository. The 25.95 β 42.64 ascent required no architectural capacity increase (hidden width dropped from 192 to 160) and no change to the cofiber decomposition itself. It came from training recipe: ATSS assignment, flip aug, EMA, PC-initialized cls_project, CLIP ViT-L/14 multi-prompt text embeddings, doubled schedule with late-training checkpoint selection, 640β768 resolution scaling, and a final 3-epoch classification-calibration fine-tune. At 18.4 percent of FCOS's head parameter budget the 2.98M checkpoint beats FCOS by +1.64 mAP under hard NMS and +1.71 under soft NMS, with the small-object gap widening to +2.9.
Broader Detection Work
Non-cofiber detection heads (FCOS baseline, untrained architectural variants, alternative formulations) are hosted in phanerozoic/detection-heads, which also includes the top-performing cofiber head (split_tower) for reference. This repository is the canonical host for cofiber-based detection research.
License
Fair Research License. See LICENSE.
- Downloads last month
- 485
Model tree for phanerozoic/cofiber-detection
Base model
facebook/EUPE-ViT-B