jet-universe
/

sophon-ak4

+---
+license: mit
+datasets:
+- jet-universe/jetclass2
+tags:
+- particle physics
+- jet tagging
+---
+# Model Card: SophonAK4
+<!-- Provide a quick summary of what the model is/does. -->
+The **SophonAK4** model is a *realistic* small-radius (*R* = 0.4) anti-*k*<sub>T</sub> jet tagger developed for [fast-simulation (Delphes) datasets under the JetClass-II configuration](https://github.com/jet-universe/jetclass2_generation?tab=readme-ov-file#delphes-step), designed to emulate the CMS detector conditions at the LHC.
+Here, *realistic* indicates that the model achieves tagging performance comparable to state-of-the-art jet taggers used in the ATLAS and CMS experiments.
+The model is constructed to cover a broad range of final states, including partons and leptons of various flavors and charges.
+## Model Details
+**SophonAK4** is trained using a multi-class classification approach based on di-*X* resonance processes, where the resonance *X* decays into multiple *two-prong* final states. Truth labelling is performed by associating reconstructed anti-k<sub>T</sub> jets with partons or leptons originating from these two-prong decays.
+A total of 23 jet labels are defined:
+- **Single-prong labels**: \\(b\\), \\(\bar{b}\\), \\(c\\), \\(\bar{c}\\), \\(s\\), \\(\bar{s}\\), \\(d\\), \\(\bar{d}\\), \\(u\\), \\(\bar{u}\\), \\(g\\), \\(e^-\\), \\(e^+\\), \\(\mu^-\\), \\(\mu^+\\), \\(\tau_{\rm h}^-\\), and \\(\tau_{\rm h}^+\\). These correspond to cases where a single truth particle (either a parton or a lepton) is matched to the jet within Δ*R*(jet, particle) < 0.4, while the other particle from the same resonance decay is not matched to the jet.
+- **Two-prong labels**: \\(b\bar{b}\\), \\(c\bar{c}\\), \\(s\bar{s}\\), \\(d\bar{d}\\), \\(u\bar{u}\\), and \\(gg\\). These labels are assigned when both particles from the same resonance decay are matched within the same jet.
+## Uses
+### Integrating SophonAK4/Sophon Models
+The **SophonAK4** model, together with the [**Sophon**](https://huggingface.co/jet-universe/sophon) model, provides a realistic benchmark for jet tagging on fast-simulation (Delphes) datasets, achieving performance comparable to state-of-the-art taggers used in the ATLAS and CMS experiments.
+- For an example of integrating them in C++ workflows to analyze Delphes files, check [[here]](https://github.com/jet-universe/sophon?tab=readme-ov-file#using-sophon-model-pythonc). (note: the SophonAK4 model will be supported since April 25')
+- For an example of how to integrate these models into the Delphes processing workflow, refer to the following GitHub repository: https://github.com/jet-universe/delphes/tree/jet-models (note: will be available since May 25')
+## Evaluation
+The performance of SophonAK4 is evaluated using the standard model \\(t\bar{t}\\) events to enable direct comparison with performance benchmarks from ATLAS and CMS.
+Details are provided in the [[Appendix B of the paper]](https://arxiv.org/html/2503.00118#:~:text=B.2,Performance%20of%20SophonAK4), and are summarized below.
+For *b*- and *c*-tagging, genuine *b*, *c*, and light-flavor jets are selected via jet-parton matching as implemented in Delphes. Jets are required to satisfy *p*<sub>T</sub> > 30 GeV and |*η*| < 2.5, consistent with CMS configurations.
+The following *b*-tagging discriminant is constructed from **SophonAK4**'s the raw output scores to evaluate *b* vs. light and *b* vs. *c* jet performance:
+ - \\(\text{discr (SophonAK4 $b$ tagging)} = g_{b} + g_{\bar{b}} + g_{b\bar{b}}.\\)
+The following *c*-tagging discriminants are defined for *c* vs. light and *c* vs. *b* jets, respectively.
+ - \\(\text{discr (SophonAK4 $c$ tagging)} = g_{c} + g_{\bar{c}} + g_{c\bar{c}},\\)
+ - \\(\text{discr (SophonAK4 $c$ vs. $b$ tagging)} = \frac{g_{c} + g_{\bar{c}} + g_{c\bar{c}}}{g_{c} + g_{\bar{c}} + g_{c\bar{c}} + g_{b} + g_{\bar{b}} + g_{b\bar{b}}}.\\)
+1. The ROC performance for *b* vs. light/*c* jets and *c* vs. light/*b* jets is shown below and can be compared to [CMS benchmarks](https://cds.cern.ch/record/2904702) (Figs. 1 and 3 for the *tt̅* process).
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6369850885fef3ca96e6dd63/H5FPFm4ZnwVbNshrPQDg_.png)
+> Conclusion
+>  - The *b* vs. light jet performance is slightly below that of the widely-adopted DeepJet tagger in CMS.
+>  - The *b* vs. *c* and *c* vs. light/*b* jet performances fall between DeepJet and ParticleNet taggers in CMS.
+>  - Similar trends are found by comparing with ATLAS's widely-adopted DL1r tagger, see [Appendix B of the paper](https://arxiv.org/html/2503.00118#:~:text=B.2,Performance%20of%20SophonAK4).
+2. Performance across different *p*<sub>T</sub> and |*η*| regions is benchmarked below and can be compared with [CMS benchmarks](https://cds.cern.ch/record/2904702) (Figs. 17, 19, 21, 23, 25, 27, 29, and 31).
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6369850885fef3ca96e6dd63/XTAce2zjvwzLhe-_OL29r.png)
+> Conclusion
+>  - Tagging performance degrades in the low-*p*<sub>T</sub> and high-|*η*| regions but reaches the plateau beyond the turn-on point, indicating that the **SophonAK4** tagger exhibits realistic flavor-tagging behavior across kinematic regimes.
+## Citation
+If you find the SophonAK4 model useful in your research, please cite:
+```
+@article{Zhao:2025rci,
+    author = "Zhao, Yuzhe and Li, Congqiao and Agapitos, Antonios and Fu, Dawei and Gao, Leyun and Mao, Yajun and Li, Qiang",
+    title = "{Novel $|V_{cb}|$ extraction method via boosted $bc$-tagging with in-situ calibration}",
+    eprint = "2503.00118",
+    archivePrefix = "arXiv",
+    primaryClass = "hep-ph",
+    month = "2",
+    year = "2025"
+}
+```