Fix code, update citation references.
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ tags:
|
|
| 14 |
## Winnow HeLa Single Shot Probability Calibrator
|
| 15 |
|
| 16 |
**Winnow** recalibrates confidence scores and provides FDR control for *de novo* peptide sequencing (DNS) workflows.
|
| 17 |
-
This repository contains the calibrator trained on HeLa Single Shot data as referenced in our paper:
|
| 18 |
|
| 19 |
- Intended inputs: spectrum input data and corresponding MS/MS PSM results produced by InstaNovo
|
| 20 |
- Outputs: calibrated per-PSM probabilities in `calibrated_confidence`.
|
|
@@ -38,9 +38,10 @@ from winnow.scripts.main import filter_dataset
|
|
| 38 |
from winnow.fdr.nonparametric import NonParametricFDRControl
|
| 39 |
|
| 40 |
# 1) Download model files
|
|
|
|
| 41 |
snapshot_download(
|
| 42 |
repo_id="InstaDeepAI/winnow-helaqc-model",
|
| 43 |
-
allow_patterns=["*.pkl"]
|
| 44 |
repo_type="model",
|
| 45 |
local_dir=helaqc_model,
|
| 46 |
)
|
|
@@ -50,8 +51,8 @@ calibrator = ProbabilityCalibrator.load(helaqc_model)
|
|
| 50 |
|
| 51 |
# 3) Load your dataset (InstaNovo-style config)
|
| 52 |
dataset = InstaNovoDatasetLoader().load(
|
| 53 |
-
"path_to_spectrum_data.parquet",
|
| 54 |
-
"path_to_instanovo_predictions.csv",
|
| 55 |
)
|
| 56 |
dataset = filter_dataset(dataset) # standard Winnow filtering
|
| 57 |
|
|
@@ -118,5 +119,40 @@ winnow predict \
|
|
| 118 |
|
| 119 |
## Citation
|
| 120 |
|
| 121 |
-
If you use
|
| 122 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
## Winnow HeLa Single Shot Probability Calibrator
|
| 15 |
|
| 16 |
**Winnow** recalibrates confidence scores and provides FDR control for *de novo* peptide sequencing (DNS) workflows.
|
| 17 |
+
This repository contains the calibrator trained on HeLa Single Shot data as referenced in our paper: [De novo peptide sequencing rescoring and FDR estimation with Winnow](https://arxiv.org/abs/2509.24952).
|
| 18 |
|
| 19 |
- Intended inputs: spectrum input data and corresponding MS/MS PSM results produced by InstaNovo
|
| 20 |
- Outputs: calibrated per-PSM probabilities in `calibrated_confidence`.
|
|
|
|
| 38 |
from winnow.fdr.nonparametric import NonParametricFDRControl
|
| 39 |
|
| 40 |
# 1) Download model files
|
| 41 |
+
helaqc_model = Path("helaqc_model")
|
| 42 |
snapshot_download(
|
| 43 |
repo_id="InstaDeepAI/winnow-helaqc-model",
|
| 44 |
+
allow_patterns=["*.pkl"],
|
| 45 |
repo_type="model",
|
| 46 |
local_dir=helaqc_model,
|
| 47 |
)
|
|
|
|
| 51 |
|
| 52 |
# 3) Load your dataset (InstaNovo-style config)
|
| 53 |
dataset = InstaNovoDatasetLoader().load(
|
| 54 |
+
data_path="path_to_spectrum_data.parquet",
|
| 55 |
+
predictions_path="path_to_instanovo_predictions.csv",
|
| 56 |
)
|
| 57 |
dataset = filter_dataset(dataset) # standard Winnow filtering
|
| 58 |
|
|
|
|
| 119 |
|
| 120 |
## Citation
|
| 121 |
|
| 122 |
+
If you use `winnow` in your research, please cite our preprint: [De novo peptide sequencing rescoring and FDR estimation with Winnow](https://arxiv.org/abs/2509.24952)
|
| 123 |
+
|
| 124 |
+
```bibtex
|
| 125 |
+
@article{mabona2025novopeptidesequencingrescoring,
|
| 126 |
+
title={De novo peptide sequencing rescoring and FDR estimation with Winnow},
|
| 127 |
+
author={Amandla Mabona and Jemma Daniel and Henrik Servais Janssen Knudsen and Rachel Catzel
|
| 128 |
+
and Kevin Michael Eloff and Erwin M. Schoof and Nicolas Lopez Carranza and Timothy P. Jenkins
|
| 129 |
+
and Jeroen Van Goey and Konstantinos Kalogeropoulos},
|
| 130 |
+
year={2025},
|
| 131 |
+
eprint={2509.24952},
|
| 132 |
+
archivePrefix={arXiv},
|
| 133 |
+
primaryClass={q-bio.QM},
|
| 134 |
+
url={https://arxiv.org/abs/2509.24952},
|
| 135 |
+
}
|
| 136 |
+
```
|
| 137 |
+
|
| 138 |
+
If you use the `InstaNovo` model to generate predictions, please also cite: [InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale proteomics experiments](https://doi.org/10.1038/s42256-025-01019-5)
|
| 139 |
+
|
| 140 |
+
```bibtex
|
| 141 |
+
@article{eloff_kalogeropoulos_2025_instanovo,
|
| 142 |
+
title = {InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale
|
| 143 |
+
proteomics experiments},
|
| 144 |
+
author = {Eloff, Kevin and Kalogeropoulos, Konstantinos and Mabona, Amandla and Morell,
|
| 145 |
+
Oliver and Catzel, Rachel and Rivera-de-Torre, Esperanza and Berg Jespersen,
|
| 146 |
+
Jakob and Williams, Wesley and van Beljouw, Sam P. B. and Skwark, Marcin J.
|
| 147 |
+
and Laustsen, Andreas Hougaard and Brouns, Stan J. J. and Ljungars,
|
| 148 |
+
Anne and Schoof, Erwin M. and Van Goey, Jeroen and auf dem Keller, Ulrich and
|
| 149 |
+
Beguir, Karim and Lopez Carranza, Nicolas and Jenkins, Timothy P.},
|
| 150 |
+
year = 2025,
|
| 151 |
+
month = {Mar},
|
| 152 |
+
day = 31,
|
| 153 |
+
journal = {Nature Machine Intelligence},
|
| 154 |
+
doi = {10.1038/s42256-025-01019-5},
|
| 155 |
+
issn = {2522-5839},
|
| 156 |
+
url = {https://doi.org/10.1038/s42256-025-01019-5}
|
| 157 |
+
}
|
| 158 |
+
```
|