BERT Base Indonesian Named Entity Recognition

This is a BERT-based model fine-tuned for Named Entity Recognition (NER) tasks in Indonesian.
The model is trained to identify and classify named entities such as persons, organizations, locations, and other relevant entities in Indonesian text.

Model Details

Model Type: BERT (Bidirectional Encoder Representations from Transformers)
Language: Indonesian (id)
Task: Token Classification / Named Entity Recognition
Base Model: cahya/bert-base-indonesian-1.5G
License: MIT

Base Model Reference

The base model, BERT Base Indonesian (uncased), was pre-trained on:

~522MB Indonesian Wikipedia
~1GB Indonesian newspaper text
using a masked language modeling (MLM) objective with a 32,000 WordPiece vocabulary.

Full details are available on its model card.

Intended Use

This fine-tuned model is intended for:

Named Entity Recognition in Indonesian text
Information extraction from Indonesian documents
Text analysis and processing applications

How to Use

Using with Transformers

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_name = "nahiar/BERT-NER"  # replace with your Hugging Face repo ID
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

text = "Presiden Joko Widodo berkunjung ke Jakarta untuk bertemu dengan Gubernur Anies Baswedan."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=2)

tokens = [tokenizer.convert_ids_to_tokens(ids) for ids in inputs["input_ids"]]
labels = [model.config.id2label[label_id] for label_id in predictions[0].tolist()]

print("Tokens:", tokens)
print("Labels:", labels)

Downloads last month: 47