InstaDeepAI
/

instanovo-v1.0.0

@@ -1,195 +1,10 @@
 ---
-license: cc-by-nc-sa-4.0
-library_name: pytorch
 tags:
-- proteomics
-- mass-spectrometry
-- peptide-sequencing
-- de-novo-sequencing
-- transformer
-- biology
-- computational-biology
-pipeline_tag: text-generation
-datasets:
-- InstaDeepAI/ms_ninespecies_benchmark
-- InstaDeepAI/ms_proteometools
 ---
-# InstaNovo: De novo Peptide Sequencing Model
-## Model Description
-InstaNovo is a state-of-the-art transformer-based model for de novo peptide sequencing from mass spectrometry data. This model enables accurate, database-free peptide identification for large-scale proteomics experiments. InstaNovo uses a transformer architecture specifically designed for peptide sequencing from tandem mass spectrometry (MS/MS) data. The model predicts peptide sequences directly from MS/MS spectra without requiring a protein database, making it particularly valuable for discovering novel peptides, post-translational modifications, and sequences from organisms with incomplete genomic databases.
-## Usage
-```python
-import torch
-import numpy as np
-import pandas as pd
-from instanovo.transformer.model import InstaNovo
-from instanovo.utils import SpectrumDataFrame
-from instanovo.transformer.dataset import SpectrumDataset, collate_batch
-from torch.utils.data import DataLoader
-from instanovo.inference import ScoredSequence
-from instanovo.inference import BeamSearchDecoder
-from instanovo.utils.metrics import Metrics
-from tqdm.notebook import tqdm
-# Load the model from the Hugging Face Hub
-model, config = InstaNovo.from_pretrained("InstaDeepAI/instanovo-v1.0.0")
-# Move the model to the GPU if available
-device = "cuda" if torch.cuda.is_available() else "cpu"
-model = model.to(device).eval()
-# Update the residue set with custom modifications
-model.residue_set.update_remapping(
-    {
-        "M(ox)": "M[UNIMOD:35]",
-        "M(+15.99)": "M[UNIMOD:35]",
-        "S(p)": "S[UNIMOD:21]",  # Phosphorylation
-        "T(p)": "T[UNIMOD:21]",
-        "Y(p)": "Y[UNIMOD:21]",
-        "S(+79.97)": "S[UNIMOD:21]",
-        "T(+79.97)": "T[UNIMOD:21]",
-        "Y(+79.97)": "Y[UNIMOD:21]",
-        "Q(+0.98)": "Q[UNIMOD:7]",  # Deamidation
-        "N(+0.98)": "N[UNIMOD:7]",
-        "Q(+.98)": "Q[UNIMOD:7]",
-        "N(+.98)": "N[UNIMOD:7]",
-        "C(+57.02)": "C[UNIMOD:4]",  # Carboxyamidomethylation
-        "(+42.01)": "[UNIMOD:1]",  # Acetylation
-        "(+43.01)": "[UNIMOD:5]",  # Carbamylation
-        "(-17.03)": "[UNIMOD:385]",
-    }
-)
-# Load the test data
-sdf = SpectrumDataFrame.from_huggingface(
-    "InstaDeepAI/ms_ninespecies_benchmark",
-    is_annotated=True,
-    shuffle=False,
-    split="test[:10%]",  # Let's only use a subset of the test data for faster inference
-)
-# Create the dataset
-ds = SpectrumDataset(
-    sdf,
-    model.residue_set,
-    config.get("n_peaks", 200),
-    return_str=True,
-    annotated=True,
-)
-# Create the data loader
-dl = DataLoader(ds, batch_size=64, shuffle=False, num_workers=0, collate_fn=collate_batch)
-# Create the decoder
-decoder = BeamSearchDecoder(model=model)
-# Initialize lists to store predictions and targets
-preds = []
-targs = []
-probs = []
-# Iterate over the data loader
-for _, batch in tqdm(enumerate(dl), total=len(dl)):
-    spectra, precursors, _, peptides, _ = batch
-    spectra = spectra.to(device)
-    precursors = precursors.to(device)
-    # Perform inference
-    with torch.no_grad():
-        p = decoder.decode(
-            spectra=spectra,
-            precursors=precursors,
-            beam_size=config["n_beams"],
-            max_length=config["max_length"],
-        )
-    preds += [x.sequence if isinstance(x, ScoredSequence) else [] for x in p]
-    probs += [
-        x.sequence_log_probability if isinstance(x, ScoredSequence) else -float("inf") for x in p
-    ]
-    targs += list(peptides)
-# Initialize metrics
-metrics = Metrics(model.residue_set, config["isotope_error_range"])
-# Compute precision and recall
-aa_precision, aa_recall, peptide_recall, peptide_precision = metrics.compute_precision_recall(
-    peptides, preds
-)
-# Compute amino acid error rate and AUC
-aa_error_rate = metrics.compute_aa_er(targs, preds)
-auc = metrics.calc_auc(targs, preds, np.exp(pd.Series(probs)))
-print(f"amino acid error rate:    {aa_error_rate:.5f}")
-print(f"amino acid precision:     {aa_precision:.5f}")
-print(f"amino acid recall:        {aa_recall:.5f}")
-print(f"peptide precision:        {peptide_precision:.5f}")
-print(f"peptide recall:           {peptide_recall:.5f}")
-print(f"area under the PR curve:  {auc:.5f}")
-```
-For more explanation, see the [Getting Started notebook](https://github.com/instadeepai/InstaNovo/blob/main/notebooks/getting_started_with_instanovo.ipynb) in the repository.
-## Citation
-If you use InstaNovo in your research, please cite:
-```bibtex
-@article{eloff_kalogeropoulos_2025_instanovo,
-        title        = {InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale
-                        proteomics experiments},
-        author       = {Eloff, Kevin and Kalogeropoulos, Konstantinos and Mabona, Amandla and Morell,
-                        Oliver and Catzel, Rachel and Rivera-de-Torre, Esperanza and Berg Jespersen,
-                        Jakob and Williams, Wesley and van Beljouw, Sam P. B. and Skwark, Marcin J.
-                        and Laustsen, Andreas Hougaard and Brouns, Stan J. J. and Ljungars,
-                        Anne and Schoof, Erwin M. and Van Goey, Jeroen and auf dem Keller, Ulrich and
-                        Beguir, Karim and Lopez Carranza, Nicolas and Jenkins, Timothy P.},
-        year         = {2025},
-        month        = {Mar},
-        day          = {31},
-        journal      = {Nature Machine Intelligence},
-        doi          = {10.1038/s42256-025-01019-5},
-        issn         = {2522-5839},
-        url          = {https://doi.org/10.1038/s42256-025-01019-5}
-}
-```
-## Resources
-- **Code Repository**: [https://github.com/instadeepai/InstaNovo](https://github.com/instadeepai/InstaNovo)
-- **Documentation**: [https://instadeepai.github.io/InstaNovo/](https://instadeepai.github.io/InstaNovo/)
-- **Publication**: [https://www.nature.com/articles/s42256-025-01019-5](https://www.nature.com/articles/s42256-025-01019-5)
-## License
-- **Code**: Licensed under Apache License 2.0
-- **Model Checkpoints**: Licensed under Creative Commons Non-Commercial (CC BY-NC-SA 4.0)
-## Installation
-```bash
-pip install instanovo
-```
-For GPU support, install with CUDA dependencies:
-```bash
-pip install instanovo[cu126]
-```
-## Requirements
-- Python >= 3.10, < 3.13
-- PyTorch >= 1.13.0
-- CUDA (optional, for GPU acceleration)
-## Support
-For questions, issues, or contributions, please visit the [GitHub repository](https://github.com/instadeepai/InstaNovo) or check the [documentation](https://instadeepai.github.io/InstaNovo/).

 ---
 tags:
+- model_hub_mixin
+- pytorch_model_hub_mixin
 ---
+This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
+- Code: [More Information Needed]
+- Paper: [More Information Needed]
+- Docs: [More Information Needed]