Add README for instanovoplus-v1.1.0

b130812 verified about 2 months ago

6.73 kB

	---
	license: cc-by-nc-sa-4.0
	library_name: pytorch
	tags:
	- proteomics
	- mass-spectrometry
	- peptide-sequencing
	- de-novo-sequencing
	- diffusion
	- multinomial-diffusion
	- biology
	- computational-biology
	pipeline_tag: text-generation
	datasets:
	- InstaDeepAI/ms_ninespecies_benchmark
	- InstaDeepAI/ms_proteometools
	---

	# InstaNovoPlus: Diffusion-Powered De novo Peptide Sequencing Model



	## Model Description

	InstaNovoPlus is a diffusion-based model for de novo peptide sequencing from mass spectrometry data. This model leverages multinomial diffusion for accurate, database-free peptide identification for large-scale proteomics experiments.


	## Usage

	```python
	import torch
	import numpy as np
	import pandas as pd
	from instanovo.diffusion.multinomial_diffusion import InstaNovoPlus
	from instanovo.utils import SpectrumDataFrame
	from instanovo.transformer.dataset import SpectrumDataset, collate_batch
	from torch.utils.data import DataLoader
	from instanovo.inference import ScoredSequence
	from instanovo.inference.diffusion import DiffusionDecoder
	from instanovo.utils.metrics import Metrics
	from tqdm.notebook import tqdm

	# Load the model from the Hugging Face Hub
	model, config = InstaNovoPlus.from_pretrained("InstaDeepAI/instanovoplus-v1.1.0")

	# Move the model to the GPU if available
	device = "cuda" if torch.cuda.is_available() else "cpu"
	model = model.to(device).eval()

	# Update the residue set with custom modifications
	model.residue_set.update_remapping(
	{
	"M(ox)": "M[UNIMOD:35]",
	"M(+15.99)": "M[UNIMOD:35]",
	"S(p)": "S[UNIMOD:21]", # Phosphorylation
	"T(p)": "T[UNIMOD:21]",
	"Y(p)": "Y[UNIMOD:21]",
	"S(+79.97)": "S[UNIMOD:21]",
	"T(+79.97)": "T[UNIMOD:21]",
	"Y(+79.97)": "Y[UNIMOD:21]",
	"Q(+0.98)": "Q[UNIMOD:7]", # Deamidation
	"N(+0.98)": "N[UNIMOD:7]",
	"Q(+.98)": "Q[UNIMOD:7]",
	"N(+.98)": "N[UNIMOD:7]",
	"C(+57.02)": "C[UNIMOD:4]", # Carboxyamidomethylation
	"(+42.01)": "[UNIMOD:1]", # Acetylation
	"(+43.01)": "[UNIMOD:5]", # Carbamylation
	"(-17.03)": "[UNIMOD:385]",
	}
	)

	# Load the test data
	sdf = SpectrumDataFrame.from_huggingface(
	"InstaDeepAI/ms_ninespecies_benchmark",
	is_annotated=True,
	shuffle=False,
	split="test[:10%]", # Let's only use a subset of the test data for faster inference
	)

	# Create the dataset
	ds = SpectrumDataset(
	sdf,
	model.residue_set,
	config.get("n_peaks", 200),
	return_str=False,
	annotated=True,
	peptide_pad_length=model.config.get("max_length", 30),
	reverse_peptide=False, # we do not reverse peptide for diffusion
	add_eos=False,
	tokenize_peptide=True,
	)

	# Create the data loader
	dl = DataLoader(
	ds,
	batch_size=64,
	num_workers=0, # sdf requirement, handled internally
	shuffle=False, # sdf requirement, handled internally
	collate_fn=collate_batch,
	)

	# Create the decoder
	diffusion_decoder = DiffusionDecoder(model=model)

	predictions = []
	log_probs = []

	# Iterate over the data loader
	for batch in tqdm(dl, total=len(dl)):
	spectra, precursors, spectra_padding_mask, peptides, _ = batch
	spectra = spectra.to(device)
	precursors = precursors.to(device)
	spectra_padding_mask = spectra_padding_mask.to(device)
	peptides = peptides.to(device)

	# Perform inference
	with torch.no_grad():
	batch_predictions, batch_log_probs = diffusion_decoder.decode(
	spectra=spectra,
	spectra_padding_mask=spectra_padding_mask,
	precursors=precursors,
	initial_sequence=peptides,
	)
	predictions.extend(batch_predictions)
	log_probs.extend(batch_log_probs)

	# Initialize metrics
	metrics = Metrics(model.residue_set, config["isotope_error_range"])

	# Compute precision and recall
	aa_precision, aa_recall, peptide_recall, peptide_precision = metrics.compute_precision_recall(
	peptides, preds
	)

	# Compute amino acid error rate and AUC
	aa_error_rate = metrics.compute_aa_er(targs, preds)
	auc = metrics.calc_auc(targs, preds, np.exp(pd.Series(probs)))

	print(f"amino acid error rate: {aa_error_rate:.5f}")
	print(f"amino acid precision: {aa_precision:.5f}")
	print(f"amino acid recall: {aa_recall:.5f}")
	print(f"peptide precision: {peptide_precision:.5f}")
	print(f"peptide recall: {peptide_recall:.5f}")
	print(f"area under the PR curve: {auc:.5f}")
	```

	For more explanation, see the [Getting Started notebook](https://github.com/instadeepai/InstaNovo/blob/main/notebooks/getting_started_with_instanovo.ipynb) in the repository.


	## Citation

	If you use InstaNovoPlus in your research, please cite:

	```bibtex
	@article{eloff_kalogeropoulos_2025_instanovo,
	title = {InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale
	proteomics experiments},
	author = {Eloff, Kevin and Kalogeropoulos, Konstantinos and Mabona, Amandla and Morell,
	Oliver and Catzel, Rachel and Rivera-de-Torre, Esperanza and Berg Jespersen,
	Jakob and Williams, Wesley and van Beljouw, Sam P. B. and Skwark, Marcin J.
	and Laustsen, Andreas Hougaard and Brouns, Stan J. J. and Ljungars,
	Anne and Schoof, Erwin M. and Van Goey, Jeroen and auf dem Keller, Ulrich and
	Beguir, Karim and Lopez Carranza, Nicolas and Jenkins, Timothy P.},
	year = {2025},
	month = {Mar},
	day = {31},
	journal = {Nature Machine Intelligence},
	doi = {10.1038/s42256-025-01019-5},
	issn = {2522-5839},
	url = {https://doi.org/10.1038/s42256-025-01019-5}
	}
	```


	## Resources

	- Code Repository: [https://github.com/instadeepai/InstaNovo](https://github.com/instadeepai/InstaNovo)
	- Documentation: [https://instadeepai.github.io/InstaNovo/](https://instadeepai.github.io/InstaNovo/)
	- Publication: [https://www.nature.com/articles/s42256-025-01019-5](https://www.nature.com/articles/s42256-025-01019-5)

	## License

	- Code: Licensed under Apache License 2.0
	- Model Checkpoints: Licensed under Creative Commons Non-Commercial (CC BY-NC-SA 4.0)

	## Installation

	```bash
	pip install instanovo
	```

	For GPU support, install with CUDA dependencies:
	```bash
	pip install instanovo[cu126]
	```

	## Requirements

	- Python >= 3.10, < 3.13
	- PyTorch >= 1.13.0
	- CUDA (optional, for GPU acceleration)


	## Support

	For questions, issues, or contributions, please visit the [GitHub repository](https://github.com/instadeepai/InstaNovo) or check the [documentation](https://instadeepai.github.io/InstaNovo/).