PatentMap-V0-SecPair-BackgroundDrawing
PatentMap-V0-SecPair-BackgroundDrawing is a patent embedding model trained on abstract + background + drawing sections with section-pair augmentation. It is part of the PatentMap V0 model collection.
Model Details
- Base Model: anferico/bert-for-patents
- Training Objective: Contrastive learning (InfoNCE loss)
- Architecture: BERT-large (340M parameters)
- Embedding Dimension: 1024
- Max Sequence Length: 512 tokens
- Vocabulary Size: 39860
- Training Data: USPTO patent applications (2010-2018) from HUPD corpus
Training Configuration
- Patent Sections Used: abstract + background + drawing
- Data Augmentation: dropout + section_pair
- Batch Size: 512
- Learning Rate: 1e-5
Special Tokens
This model includes additional patent-specific special tokens:
[drawing]
Usage
Input Format
This model expects patent text formatted with special tokens:
- For abstract:
Title [SEP] [abstract] Abstract text - For other sections:
[section] Section text(no title prefix)
Example:
# Abstract with title
text = "Smart thermostat system [SEP] [abstract] A thermostat system comprising..."
# Claim without title
text = "[claim] A method comprising: step 1, step 2..."
Code Example
from transformers import AutoTokenizer, AutoModel
import torch
# Load model and tokenizer
model_name = "ZoeYou/PatentMap-V0-SecPair-BackgroundDrawing"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
# Format patent text
title = "Smart thermostat system"
abstract = "A thermostat system comprising a temperature sensor..."
patent_text = f"{title} [SEP] [abstract] {abstract}"
# Encode and get embeddings
inputs = tokenizer(patent_text, return_tensors="pt", padding=True, truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state[:, 0, :] # CLS token
print(embeddings.shape) # torch.Size([1, 1024])
Evaluation
This model has been evaluated on multiple patent-specific tasks:
- IPC Classification (linear probe and KNN)
- Prior Art Search (recall@k, nDCG@k)
- Embedding Quality Metrics (uniformity, alignment, topology)
For detailed evaluation results, see the PatentMap paper.
Intended Use
This model is designed for:
- Patent document retrieval
- Patent similarity search
- Prior art discovery
- IPC classification
- Patent landscape analysis
Citation
If you use this model, please cite:
@article{zuo2025patent,
title={Patent Representation Learning via Self-supervision},
author={Zuo, You and Gerdes, Kim and de La Clergerie, Eric Villemonte and Sagot, Beno{\^i}t},
journal={arXiv preprint arXiv:2511.10657},
year={2025}
}
Model Collection
This model is part of the PatentMap V0 collection. For an overview of all models, see PatentMap-V0.
License
This model is released under CC BY-NC 4.0 license (non-commercial use only).
Contact
For questions or issues, please open an issue on the GitHub repository or contact the authors.
- Downloads last month
- 16
Model tree for ZoeYou/PatentMap-V0-SecPair-BackgroundDrawing
Base model
anferico/bert-for-patents