BERT Base Indonesian Named Entity Recognition
This is a BERT-based model fine-tuned for Named Entity Recognition (NER) tasks in Indonesian.
The model is trained to identify and classify named entities such as persons, organizations, locations, and other relevant entities in Indonesian text.
Model Details
- Model Type: BERT (Bidirectional Encoder Representations from Transformers)
- Language: Indonesian (id)
- Task: Token Classification / Named Entity Recognition
- Base Model:
cahya/bert-base-indonesian-1.5G - License: MIT
Base Model Reference
The base model, BERT Base Indonesian (uncased), was pre-trained on:
- ~522MB Indonesian Wikipedia
- ~1GB Indonesian newspaper text
using a masked language modeling (MLM) objective with a 32,000 WordPiece vocabulary.
Full details are available on its model card.
Intended Use
This fine-tuned model is intended for:
- Named Entity Recognition in Indonesian text
- Information extraction from Indonesian documents
- Text analysis and processing applications
How to Use
Using with Transformers
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
model_name = "nahiar/BERT-NER" # replace with your Hugging Face repo ID
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
text = "Presiden Joko Widodo berkunjung ke Jakarta untuk bertemu dengan Gubernur Anies Baswedan."
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
tokens = [tokenizer.convert_ids_to_tokens(ids) for ids in inputs["input_ids"]]
labels = [model.config.id2label[label_id] for label_id in predictions[0].tolist()]
print("Tokens:", tokens)
print("Labels:", labels)
- Downloads last month
- 47