You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Introduction

KorPatELECTRA is a pre-trained language model based on the Google ELECTRA architecture, trained on 4.6 million Korean patent documents and 0.5 billion sentences. It demonstrates high performance in patent domain-specific tasks, including Named Entity Recognition (NER), Machine Reading Comprehension (MRC), and Patent Classification.

Model Vocab Size NER Classification MRC
Google BERT 110,000 87.98 72.33 87.79
KorPatBERT 21,400 87.89 76.32 85.61
KoELECTRA 35,000 87.47 72.98 88.09
KorPatELECTRA 35,000 91.01 73.90 89.85

For more details, please refer to our GitHub.

How to use

from transformers import BertTokenizer, BertModel
import torch

# Load model and tokenizer 
model_name = "KIPI-ai/KorPatElectra"

# Access token (replace with your actual token from Hugging Face)
access_token = "hf_여기에_λ³΅μ‚¬ν•œ_토큰_λΆ™μ—¬λ„£κΈ°"

model = BertModel.from_pretrained(model_name, use_auth_token=access_token)
tokenizer = BertTokenizer.from_pretrained(model_name, use_auth_token=access_token)

# Sample sentence
sentence_org = "λ³Έ κ³ μ•ˆμ€ 주둜 일회용 ν•©μ„±μ„Έμ œμ•‘μ„ μ§‘μ–΄λ„£μ–΄ λ°€λ΄‰ν•˜λŠ” μ„Έμ œμ•‘ν¬μ˜ λ‚΄λΆ€λ₯Ό μ›ν˜ΈμƒμœΌλ‘œ μ—΄μ€‘μ°©ν•˜λ˜ μ„Έμ œμ•‘μ΄ λ°°μΆœλ˜λŠ” μ ˆλ‹¨λΆ€ μͺ½μœΌλ‘œ 내벽을 ν˜‘μ†Œν•˜κ²Œ ν˜•μ„±ν•˜μ—¬μ„œ 내뢀에 λ“€μ–΄μžˆλŠ” μ„Έμ œμ•‘μ„ 잘짜질 수 μžˆλ„λ‘ ν•˜λŠ” ν•©μ„±μ„Έμ œ 앑포에 κ΄€ν•œ 것이닀."

# Tokenization
inputs = tokenizer(sentence_org, return_tensors="pt")

# Model input
outputs = model(**inputs)

# Extract the last hidden states
last_hidden_states = outputs.last_hidden_state
cls_vector = last_hidden_states[:, 0, :]  # (batch_size, hidden_size)

print(f"1. Length of vocab : {tokenizer.vocab_size}")
print(f"2. Input example : {sentence_org}")
print(f"3. Tokenized example : {inputs}")
print(f"4. vector shape : {cls_vector.shape}")

# Output
1. Length of vocab : 35000
2. Input example : λ³Έ κ³ μ•ˆμ€ 주둜 일회용 ν•©μ„±μ„Έμ œμ•‘μ„ μ§‘μ–΄λ„£μ–΄ λ°€λ΄‰ν•˜λŠ” μ„Έμ œμ•‘ν¬μ˜ λ‚΄λΆ€λ₯Ό μ›ν˜ΈμƒμœΌλ‘œ μ—΄μ€‘μ°©ν•˜λ˜ μ„Έμ œμ•‘μ΄ λ°°μΆœλ˜λŠ” μ ˆλ‹¨λΆ€ μͺ½μœΌλ‘œ 내벽을 ν˜‘μ†Œν•˜κ²Œ ν˜•μ„±ν•˜μ—¬μ„œ 내뢀에 λ“€μ–΄μžˆλŠ” μ„Έμ œμ•‘μ„ 잘짜질 수 μžˆλ„λ‘ ν•˜λŠ” ν•©μ„±μ„Έμ œ 앑포에 κ΄€ν•œ 것이닀.
3. Tokenized example : {'input_ids': tensor([[    2,  2326,  7419,  5091,  9295, 13078,  7590, 26872,  4885,  5216,
         31417,  4749,  8328,  4706,  4805, 11492,  4885,  5042,  5076,  7300,
          5383, 12265,  4802, 27067,  3045,  4737,  4818,  4706,  4999, 11492,
          4885,  4732,  7487,  4999,  4805, 16407,  3500, 27067, 10196,  5216,
         18219,  4706,  5036,  7220, 15064,  4739,  7300,  4963,  9062,  5708,
          4805, 11492,  4885,  5216,  3257,  5534,  4967,  2748,  3244,  4839,
          4848,  4343,  4805,  7590, 26872,  2965,  5042,  4963,  7372,   867,
          4732,  5101,   216,     3]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1]])}
4. vector shape : torch.Size([1, 768]

πŸ” Note: This model requires access approval. Please log in to Hugging Face, request access to the model, and use your personal access token (get yours here).

Reference

License

Any commercial exploitation of this model requires a separate commercial license agreement with the Licensor.

Contact

KIPI AI Support : [email protected]

Downloads last month
1,956
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for KIPI-ai/KorPatElectra

Finetunes
1 model