YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

📌 Model Card: LEGIT-BART Series

🏛️ Model Overview

The LEGIT-BART models are a family of pre-trained transformer-based models for Italian legal text processing. They build upon BART-IT (morenolq/bart-it) and are further pre-trained on Italian legal corpora.

💡 Key features:

Extended context length with Local-Sparse-Global (LSG) Attention (up to 16,384 tokens) 📜
Trained on legal documents such as statutes, case law, and contracts 📑
Not fine-tuned for specific tasks (requires further adaptation)

⚠️ This specific model is pre-trained on general-purpose Italian text! Please select the best model from the table below.

📂 Available Models

Model	Description	Link
LEGIT-BART	Continued pre-training of `morenolq/bart-it` on Italian legal texts	🔗 Link
LEGIT-BART-LSG-4096	Continued pre-training of `morenolq/bart-it`, supporting 4,096 tokens	🔗 Link
LEGIT-BART-LSG-16384	Continued pre-training of `morenolq/bart-it`, supporting 16,384 tokens	🔗 Link
LEGIT-SCRATCH-BART	Trained from scratch on Italian legal texts	🔗 Link
LEGIT-SCRATCH-BART-LSG-4096	Trained from scratch with LSG attention, supporting 4,096 tokens	🔗 Link
LEGIT-SCRATCH-BART-LSG-16384	Trained from scratch with LSG attention, supporting 16,384 tokens	🔗 Link
BART-IT-LSG-4096	`morenolq/bart-it` with LSG attention, supporting 4,096 tokens (⚠️ no legal adaptation)	🔗 Link
BART-IT-LSG-16384	`morenolq/bart-it` with LSG attention, supporting 16,384 tokens (⚠️ no legal adaptation)	🔗 Link

🛠️ Model Details

🔹 Architecture

Base Model: morenolq/bart-it
Transformer Encoder-Decoder
LSG Attention for long documents
Specific tokenizers for models trained from scratch (underperforming continual pre-training in our experiments).

🔹 Training Data

Dataset: joelniklaus/Multi_Legal_Pile
Types of legal texts used:
- Legislation (laws, codes, amendments)
- Case law (judicial decisions)
- Contracts (public legal agreements)

🚀 How to Use

from transformers import BartForConditionalGeneration, AutoTokenizer

# Load tokenizer and model
model_name = "morenolq/BART-IT-LSG-4096"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)

# Example input
input_text = "<mask> 1234: Il contratto si intende concluso quando..."
inputs = tokenizer(input_text, return_tensors="pt", max_length=4096, truncation=True)

# Generate summary
summary_ids = model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("📝 Summary:", summary)

⚠️ Limitations & Ethical Considerations

Not fine-tuned for specific tasks: The models are pre-trained on legal texts and may require further adaptation for specific legal NLP tasks (e.g., summarization, question-answering).
Bias and fairness: Legal texts may contain biases present in the legal system. Care should be taken to ensure fairness and ethical use of the models.
Legal advice: The models are not a substitute for professional legal advice. Always consult a qualified legal professional for legal matters.

📚 Reference

The paper presenting LEGIT-BART models is currently under review and will be updated here once published.

@article{benedetto2025legitbart,
    title        = {LegItBART: a summarization model for Italian legal documents},
    author       = {Benedetto, Irene and La Quatra, Moreno and Cagliero, Luca},
    year         = 2025,
    journal      = {Artificial Intelligence and Law},
    publisher    = {Springer},
    pages        = {1--31},
    doi          = {10.1007/s10506-025-09436-y},
    url          = {doi.org/10.1007/s10506-025-09436-y}
}

Downloads last month: 3

Model tree for morenolq/BART-IT-LSG-4096

Base model

morenolq/bart-it

Finetuned

(8)

this model

Dataset used to train morenolq/BART-IT-LSG-4096

Collection including morenolq/BART-IT-LSG-4096

LEGIT-BART

Collection

This collection includes all LEGIT-BART models: Italian legal pre-trained models with varying context lengths utilizing LSG. • 8 items • Updated Feb 26