Merchant Name Extraction Model
This model extracts merchant names from transaction descriptions using Named Entity Recognition (NER).
Model Details
- Model Type: DistilBERT for Token Classification
- Task: Merchant Name Extraction
- Language: English
- Framework: PyTorch + Transformers
Usage
from transformers import DistilBertTokenizerFast, DistilBertForTokenClassification
import torch
# Load model and tokenizer
model = DistilBertForTokenClassification.from_pretrained("GalalEwida/SIA-MerchentName")
tokenizer = DistilBertTokenizerFast.from_pretrained("GalalEwida/SIA-MerchentName")
# Prediction function
def extract_merchant(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
id2label = {0: 'O', 1: 'B-MERCHANT', 2: 'I-MERCHANT'}
predicted_labels = [id2label[pred.item()] for pred in predictions[0]]
merchant_tokens = []
for token, label in zip(tokens, predicted_labels):
if label in ['B-MERCHANT', 'I-MERCHANT']:
if token.startswith('##'):
if merchant_tokens:
merchant_tokens[-1] += token[2:]
else:
merchant_tokens.append(token)
return ' '.join(merchant_tokens)
# Example usage
text = "WALMART SUPERCENTER #1234 ANYTOWN US"
merchant = extract_merchant(text)
print(f"Extracted: {merchant}")
Labels
O: Outside (not part of merchant name)B-MERCHANT: Beginning of merchant nameI-MERCHANT: Inside merchant name
Example Predictions
| Input | Extracted Merchant |
|---|---|
| WALMART SUPERCENTER #1234 ANYTOWN US | WALMART |
| AMAZON.COM AMZN.COM/BILL WA | AMAZON |
| STARBUCKS STORE #0123 NEW YORK NY | STARBUCKS |
- Downloads last month
- -