BioGeek commited on
Commit
f6d4426
·
verified ·
1 Parent(s): 6c6e6cc

Add instanovo-v1.0.0 model

Browse files
Files changed (1) hide show
  1. README.md +6 -191
README.md CHANGED
@@ -1,195 +1,10 @@
1
  ---
2
- license: cc-by-nc-sa-4.0
3
- library_name: pytorch
4
  tags:
5
- - proteomics
6
- - mass-spectrometry
7
- - peptide-sequencing
8
- - de-novo-sequencing
9
- - transformer
10
- - biology
11
- - computational-biology
12
- pipeline_tag: text-generation
13
- datasets:
14
- - InstaDeepAI/ms_ninespecies_benchmark
15
- - InstaDeepAI/ms_proteometools
16
  ---
17
 
18
- # InstaNovo: De novo Peptide Sequencing Model
19
- ## Model Description
20
-
21
- InstaNovo is a state-of-the-art transformer-based model for de novo peptide sequencing from mass spectrometry data. This model enables accurate, database-free peptide identification for large-scale proteomics experiments. InstaNovo uses a transformer architecture specifically designed for peptide sequencing from tandem mass spectrometry (MS/MS) data. The model predicts peptide sequences directly from MS/MS spectra without requiring a protein database, making it particularly valuable for discovering novel peptides, post-translational modifications, and sequences from organisms with incomplete genomic databases.
22
-
23
- ## Usage
24
-
25
- ```python
26
- import torch
27
- import numpy as np
28
- import pandas as pd
29
- from instanovo.transformer.model import InstaNovo
30
- from instanovo.utils import SpectrumDataFrame
31
- from instanovo.transformer.dataset import SpectrumDataset, collate_batch
32
- from torch.utils.data import DataLoader
33
- from instanovo.inference import ScoredSequence
34
- from instanovo.inference import BeamSearchDecoder
35
- from instanovo.utils.metrics import Metrics
36
- from tqdm.notebook import tqdm
37
-
38
- # Load the model from the Hugging Face Hub
39
- model, config = InstaNovo.from_pretrained("InstaDeepAI/instanovo-v1.0.0")
40
-
41
- # Move the model to the GPU if available
42
- device = "cuda" if torch.cuda.is_available() else "cpu"
43
- model = model.to(device).eval()
44
-
45
- # Update the residue set with custom modifications
46
- model.residue_set.update_remapping(
47
- {
48
- "M(ox)": "M[UNIMOD:35]",
49
- "M(+15.99)": "M[UNIMOD:35]",
50
- "S(p)": "S[UNIMOD:21]", # Phosphorylation
51
- "T(p)": "T[UNIMOD:21]",
52
- "Y(p)": "Y[UNIMOD:21]",
53
- "S(+79.97)": "S[UNIMOD:21]",
54
- "T(+79.97)": "T[UNIMOD:21]",
55
- "Y(+79.97)": "Y[UNIMOD:21]",
56
- "Q(+0.98)": "Q[UNIMOD:7]", # Deamidation
57
- "N(+0.98)": "N[UNIMOD:7]",
58
- "Q(+.98)": "Q[UNIMOD:7]",
59
- "N(+.98)": "N[UNIMOD:7]",
60
- "C(+57.02)": "C[UNIMOD:4]", # Carboxyamidomethylation
61
- "(+42.01)": "[UNIMOD:1]", # Acetylation
62
- "(+43.01)": "[UNIMOD:5]", # Carbamylation
63
- "(-17.03)": "[UNIMOD:385]",
64
- }
65
- )
66
-
67
- # Load the test data
68
- sdf = SpectrumDataFrame.from_huggingface(
69
- "InstaDeepAI/ms_ninespecies_benchmark",
70
- is_annotated=True,
71
- shuffle=False,
72
- split="test[:10%]", # Let's only use a subset of the test data for faster inference
73
- )
74
-
75
- # Create the dataset
76
- ds = SpectrumDataset(
77
- sdf,
78
- model.residue_set,
79
- config.get("n_peaks", 200),
80
- return_str=True,
81
- annotated=True,
82
- )
83
-
84
- # Create the data loader
85
- dl = DataLoader(ds, batch_size=64, shuffle=False, num_workers=0, collate_fn=collate_batch)
86
-
87
- # Create the decoder
88
- decoder = BeamSearchDecoder(model=model)
89
-
90
- # Initialize lists to store predictions and targets
91
- preds = []
92
- targs = []
93
- probs = []
94
-
95
- # Iterate over the data loader
96
- for _, batch in tqdm(enumerate(dl), total=len(dl)):
97
- spectra, precursors, _, peptides, _ = batch
98
- spectra = spectra.to(device)
99
- precursors = precursors.to(device)
100
-
101
- # Perform inference
102
- with torch.no_grad():
103
- p = decoder.decode(
104
- spectra=spectra,
105
- precursors=precursors,
106
- beam_size=config["n_beams"],
107
- max_length=config["max_length"],
108
- )
109
-
110
-
111
- preds += [x.sequence if isinstance(x, ScoredSequence) else [] for x in p]
112
- probs += [
113
- x.sequence_log_probability if isinstance(x, ScoredSequence) else -float("inf") for x in p
114
- ]
115
- targs += list(peptides)
116
-
117
- # Initialize metrics
118
- metrics = Metrics(model.residue_set, config["isotope_error_range"])
119
-
120
-
121
- # Compute precision and recall
122
- aa_precision, aa_recall, peptide_recall, peptide_precision = metrics.compute_precision_recall(
123
- peptides, preds
124
- )
125
-
126
- # Compute amino acid error rate and AUC
127
- aa_error_rate = metrics.compute_aa_er(targs, preds)
128
- auc = metrics.calc_auc(targs, preds, np.exp(pd.Series(probs)))
129
-
130
- print(f"amino acid error rate: {aa_error_rate:.5f}")
131
- print(f"amino acid precision: {aa_precision:.5f}")
132
- print(f"amino acid recall: {aa_recall:.5f}")
133
- print(f"peptide precision: {peptide_precision:.5f}")
134
- print(f"peptide recall: {peptide_recall:.5f}")
135
- print(f"area under the PR curve: {auc:.5f}")
136
- ```
137
-
138
- For more explanation, see the [Getting Started notebook](https://github.com/instadeepai/InstaNovo/blob/main/notebooks/getting_started_with_instanovo.ipynb) in the repository.
139
-
140
-
141
- ## Citation
142
-
143
- If you use InstaNovo in your research, please cite:
144
-
145
- ```bibtex
146
- @article{eloff_kalogeropoulos_2025_instanovo,
147
- title = {InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale
148
- proteomics experiments},
149
- author = {Eloff, Kevin and Kalogeropoulos, Konstantinos and Mabona, Amandla and Morell,
150
- Oliver and Catzel, Rachel and Rivera-de-Torre, Esperanza and Berg Jespersen,
151
- Jakob and Williams, Wesley and van Beljouw, Sam P. B. and Skwark, Marcin J.
152
- and Laustsen, Andreas Hougaard and Brouns, Stan J. J. and Ljungars,
153
- Anne and Schoof, Erwin M. and Van Goey, Jeroen and auf dem Keller, Ulrich and
154
- Beguir, Karim and Lopez Carranza, Nicolas and Jenkins, Timothy P.},
155
- year = {2025},
156
- month = {Mar},
157
- day = {31},
158
- journal = {Nature Machine Intelligence},
159
- doi = {10.1038/s42256-025-01019-5},
160
- issn = {2522-5839},
161
- url = {https://doi.org/10.1038/s42256-025-01019-5}
162
- }
163
- ```
164
-
165
- ## Resources
166
-
167
- - **Code Repository**: [https://github.com/instadeepai/InstaNovo](https://github.com/instadeepai/InstaNovo)
168
- - **Documentation**: [https://instadeepai.github.io/InstaNovo/](https://instadeepai.github.io/InstaNovo/)
169
- - **Publication**: [https://www.nature.com/articles/s42256-025-01019-5](https://www.nature.com/articles/s42256-025-01019-5)
170
-
171
- ## License
172
-
173
- - **Code**: Licensed under Apache License 2.0
174
- - **Model Checkpoints**: Licensed under Creative Commons Non-Commercial (CC BY-NC-SA 4.0)
175
-
176
- ## Installation
177
-
178
- ```bash
179
- pip install instanovo
180
- ```
181
-
182
- For GPU support, install with CUDA dependencies:
183
- ```bash
184
- pip install instanovo[cu126]
185
- ```
186
-
187
- ## Requirements
188
-
189
- - Python >= 3.10, < 3.13
190
- - PyTorch >= 1.13.0
191
- - CUDA (optional, for GPU acceleration)
192
-
193
- ## Support
194
-
195
- For questions, issues, or contributions, please visit the [GitHub repository](https://github.com/instadeepai/InstaNovo) or check the [documentation](https://instadeepai.github.io/InstaNovo/).
 
1
  ---
 
 
2
  tags:
3
+ - model_hub_mixin
4
+ - pytorch_model_hub_mixin
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
+ This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
+ - Code: [More Information Needed]
9
+ - Paper: [More Information Needed]
10
+ - Docs: [More Information Needed]