Introduction
KorPatELECTRA is a pre-trained language model based on the Google ELECTRA architecture, trained on 4.6 million Korean patent documents and 0.5 billion sentences. It demonstrates high performance in patent domain-specific tasks, including Named Entity Recognition (NER), Machine Reading Comprehension (MRC), and Patent Classification.
Model | Vocab Size | NER | Classification | MRC |
---|---|---|---|---|
Google BERT | 110,000 | 87.98 | 72.33 | 87.79 |
KorPatBERT | 21,400 | 87.89 | 76.32 | 85.61 |
KoELECTRA | 35,000 | 87.47 | 72.98 | 88.09 |
KorPatELECTRA | 35,000 | 91.01 | 73.90 | 89.85 |
For more details, please refer to our GitHub.
How to use
from transformers import BertTokenizer, BertModel
import torch
# Load model and tokenizer
model_name = "KIPI-ai/KorPatElectra"
# Access token (replace with your actual token from Hugging Face)
access_token = "hf_μ¬κΈ°μ_볡μ¬ν_ν ν°_λΆμ¬λ£κΈ°"
model = BertModel.from_pretrained(model_name, use_auth_token=access_token)
tokenizer = BertTokenizer.from_pretrained(model_name, use_auth_token=access_token)
# Sample sentence
sentence_org = "λ³Έ κ³ μμ μ£Όλ‘ μΌνμ© ν©μ±μΈμ μ‘μ μ§μ΄λ£μ΄ λ°λ΄νλ μΈμ μ‘ν¬μ λ΄λΆλ₯Ό μνΈμμΌλ‘ μ΄μ€μ°©νλ μΈμ μ‘μ΄ λ°°μΆλλ μ λ¨λΆ μͺ½μΌλ‘ λ΄λ²½μ νμνκ² νμ±νμ¬μ λ΄λΆμ λ€μ΄μλ μΈμ μ‘μ μμ§μ§ μ μλλ‘ νλ ν©μ±μΈμ μ‘ν¬μ κ΄ν κ²μ΄λ€."
# Tokenization
inputs = tokenizer(sentence_org, return_tensors="pt")
# Model input
outputs = model(**inputs)
# Extract the last hidden states
last_hidden_states = outputs.last_hidden_state
cls_vector = last_hidden_states[:, 0, :] # (batch_size, hidden_size)
print(f"1. Length of vocab : {tokenizer.vocab_size}")
print(f"2. Input example : {sentence_org}")
print(f"3. Tokenized example : {inputs}")
print(f"4. vector shape : {cls_vector.shape}")
# Output
1. Length of vocab : 35000
2. Input example : λ³Έ κ³ μμ μ£Όλ‘ μΌνμ© ν©μ±μΈμ μ‘μ μ§μ΄λ£μ΄ λ°λ΄νλ μΈμ μ‘ν¬μ λ΄λΆλ₯Ό μνΈμμΌλ‘ μ΄μ€μ°©νλ μΈμ μ‘μ΄ λ°°μΆλλ μ λ¨λΆ μͺ½μΌλ‘ λ΄λ²½μ νμνκ² νμ±νμ¬μ λ΄λΆμ λ€μ΄μλ μΈμ μ‘μ μμ§μ§ μ μλλ‘ νλ ν©μ±μΈμ μ‘ν¬μ κ΄ν κ²μ΄λ€.
3. Tokenized example : {'input_ids': tensor([[ 2, 2326, 7419, 5091, 9295, 13078, 7590, 26872, 4885, 5216,
31417, 4749, 8328, 4706, 4805, 11492, 4885, 5042, 5076, 7300,
5383, 12265, 4802, 27067, 3045, 4737, 4818, 4706, 4999, 11492,
4885, 4732, 7487, 4999, 4805, 16407, 3500, 27067, 10196, 5216,
18219, 4706, 5036, 7220, 15064, 4739, 7300, 4963, 9062, 5708,
4805, 11492, 4885, 5216, 3257, 5534, 4967, 2748, 3244, 4839,
4848, 4343, 4805, 7590, 26872, 2965, 5042, 4963, 7372, 867,
4732, 5101, 216, 3]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1]])}
4. vector shape : torch.Size([1, 768]
π Note: This model requires access approval. Please log in to Hugging Face, request access to the model, and use your personal access token (get yours here).
Reference
License
Any commercial exploitation of this model requires a separate commercial license agreement with the Licensor.
Contact
KIPI AI Support : [email protected]
- Downloads last month
- 1,956
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support