You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Books Named Entity Recognition Model

This model specialises in recognising book titles and author names in short, user‑typed queries. It achieves > 92 % F1 on a held‑out evaluation set.

Check out this model in action in this experience!

1 Provenance

This model's provenance originates from fine-tuning gliner-community/gliner_large‑v2.5 using synthetic query data derived from Project Gutenberg's public catalogue.

2 Use‑case

This model is a drop‑in replacement for generic GLiNER when your text stream revolves around bibliographic requests such as:

“Looking for Dune from Frank Herbert.”
“Any recommendations by Mary Shelley?”

Typical applications:

Query understanding in library / e‑book search engines
Post‑processing LLM output to structure reading lists
Digital humanities pipelines that need lightweight title/author extraction

Not suitable for: recognising publishers, ISBNs or long BIB‑style references (only short queries were used for training).

3 Performance

Metric	Overall	`title`	`author`
Precision	0.9999	0.9999	0.9999
Recall	0.8583	0.7661	0.9287
F1‑score	0.9237	0.8675	0.9630
Support	69 880	30 290	39 590

Evaluation dataset: 43 493 queries (English‑only) held out from the training corpus. Prediction threshold = 0.2.

4 Quick start

from gliner import GLiNER

model = GLiNER.from_pretrained("empathyai/gliner_large-v2.5-books")
text = "Looking for The Man in the High Castle by Philip K. Dick."

entities = model.predict_entities(text, ["title", "author"], threshold=0.2)
print(entities)
# [{'text': 'The Man in the High Castle', 'label': 'title', 'score': 0.99},
#  {'text': 'Philip K. Dick', 'label': 'author', 'score': 0.99}]

5 Training details

Base model: gliner_large-v2.5 (≈ 459 M parameters, Creative Commons Zero v1.0 Universal)
Dataset: empathyai/books-ner-dataset — 435 k synthetic English queries (titles + authors only)
Splits: 391 432 train / 43 493 eval (duplicates removed)
Script highlights
- Learning rate 5 × 10⁻⁶, linear schedule, warm‑up 10 %
- Batch 32, gradient accumulation 2, focal loss α 0.75 / γ 2
- 1 epoch
- Gradient checkpointing + BF16 for memory efficiency
- Trained on a single L40S; total wall time ≈ 40 min

6 Limitations & bias

The vocabulary of titles/authors comes from Project Gutenberg (public‑domain heavy; modern best‑sellers may be unseen).
Only short, informal English queries were simulated. Long paragraphs or non‑English text may degrade accuracy.
Does not tag publishers, dates, ISBNs, or other bibliographic fields.

7 Acknowledgements

Thanks to the GLiNER authors and maintainers; HuggingFace for hosting; Project Gutenberg volunteers for the free metadata.

Downloads last month: 9

Model tree for empathyai/gliner_large-v2.5-books

Base model

gliner-community/gliner_large-v2.5

Finetuned

(3)

this model

Finetunes

1 model

empathyai
/

gliner_large-v2.5-books