TermsConditioned β RoBERTa-large LEDGAR + LoRA
A RoBERTa-large encoder, fine-tuned with LoRA on the LEDGAR subset of LexGLUE to classify contract paragraphs into 100 clause families, with an explicit risk bucket and slice-level governance analysis.
This repo only contains the adapter weights + tokenizer, not the full base model.
To use it, you must load roberta-large from Hugging Face and then apply these LoRA adapters.
You cannot AutoModelForSequenceClassification.from_pretrained("akshan-main/β¦")
1. What this model does
- Input: a single contract paragraph (e.g., ToS, MSA, clickwrap clause).
- Output: one of 100 LEDGAR clause families (e.g.,
Arbitration,Governing Laws,Indemnity,Limitation Of Liability,Amendments, etc.).
2. Intended use
2.1. Primary use case
This model is designed to be part of a Terms & Conditions / contract intake triage tool that:
- Splits a document into paragraphs.
- Runs this classifier on each paragraph.
- Applies a policy over the probabilities:
- high-confidence risky clause β βFlag as riskyβ
- high-confidence non-risky clause β βGreen-lightβ
- low-confidence β βNeeds reviewβ (abstain)
2.2. Non-goals
- Not legal advice.
- Not guaranteed fair / non-biased for every jurisdiction or contract type.
- Not designed to replace full contract review or negotiation tools.
3. Training data
- Dataset:
coastalcph/lex_glue(LEDGAR split) - Train / Validation / Test:
- train: 60,000 paragraphs
- validation: 10,000 paragraphs
- test: 10,000 paragraphs
- Labels: 100 clause families as defined in LEDGAR.
Each example is a single paragraph of a contract, labeled with exactly one family.
4. Model architecture & fine-tuning
4.1 Base model
roberta-largefrom Hugging Face (transformers).
4.2 LoRA setup
We apply LoRA to a subset of the encoder:
- Target modules:
query,key,value,intermediate.dense,output.dense - LoRA config:
r = 16lora_alpha = 32lora_dropout = 0.05
- Frozen: All other base model weights.
- Saved extra modules:
classifierhead kept and saved along with adapters.
4.3 Optimization & training
- Objective: weighted cross-entropy with class weights to counter label imbalance.
- Optimizer: AdamW (8-bit or standard), weight decay 0.1
- Scheduler: cosine LR with warmup
- Batch size (effective): 32 (per-device Γ grad_accumulation)
- Epochs: 5
- Max seq length: 384 tokens
- Hardware: single GPU (tested on A100)
Reproducibility knobs:
- Fixed random seed (42) for Python / NumPy / PyTorch.
- Deterministic behavior is not fully guaranteed but training is stable.
5. Evaluation
All numbers below are on the validation split (10,000 paragraphs) with the LoRA adapters applied.
5.1 Standard metrics
- Accuracy: ~0.869
- Macro F1: ~0.790
This is a multi-class setting with 100 labels and notable class imbalance.
5.2 Calibration
On top of logits, we apply temperature scaling:
- Search over a grid of temperatures.
- Best temperature on validation: T* β 0.8
- Expected Calibration Error (ECE) before / after scaling:
ECE_raw β 0.115ECE_cal β 0.022
These calibrated probabilities are what we use for governance policies (false-green caps, abstain band, etc.).
6. Inference: using the model
Load base + adapters
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
BASE = "roberta-large"
ADAPTER_REPO = "akshan-main/termsconditioned-roberta-large-ledgar-lora"
tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForSequenceClassification.from_pretrained(
BASE,
num_labels=100,
)
model = PeftModel.from_pretrained(base_model, ADAPTER_REPO)
model.eval()
You can test the model on synthetic or real ToS paragraphs (for example, arbitration clauses, limitation of liability caps, or indemnity language)
# Must run the above cell first
# This cell is a sample use case for the model
text = "Any dispute arising out of or relating to this Agreement shall be finally settled by binding arbitration..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=384)
with torch.no_grad():
outputs = model(**inputs)
probs = outputs.logits.softmax(dim=-1)[0]
topk = torch.topk(probs, k=5)
for idx, score in zip(topk.indices.tolist(), topk.values.tolist()):
print(idx, float(score))
7. Limitations and warnings
- Domain
The model is trained on LEDGAR (public contract clauses). Behavior on consumer terms of service, privacy policies, employment agreements, or narrow industry contracts may differ. You should re-check performance on your own corpus.
- Single label per paragraph
The dataset assumes one dominant clause family per paragraph. Real-world paragraphs can mix multiple concerns (for example, arbitration plus waiver of class actions). Treat the prediction as the "primary" family, not an exhaustive tagging of everything risky in the text.
- Language
Training data is English-only; performance on other languages is not characterized.
- Legal risk
This model is not legal advice. Any production use should keep a human in the loop.
8. How to cite or reference
If you use this model in a writeup, you can cite it as:
@misc{akshan_krithick_2025,
author = { Akshan Krithick },
title = { termsconditioned-roberta-large-ledgar-lora (Revision 1605a22) },
year = 2025,
url = { https://huggingface.co/akshan-main/termsconditioned-roberta-large-ledgar-lora },
doi = { 10.57967/hf/7109 },
publisher = { Hugging Face }
}
9. Files in this repo
- adapter_model.safetensors β LoRA adapter weights for the classifier head and selected encoder modules
- adapter_config.json β PEFT / LoRA configuration
- config.json β model configuration (num_labels, id2label, label2id, etc.)
- tokenizer.json, vocab.json, merges.txt, tokenizer_config.json β tokenizer assets compatible with roberta-large
- special_tokens_map.json β tokenizer special token mapping
training checkpoints- README.md
The base roberta-large weights are not duplicated here; at inference time they are loaded from the main Hugging Face model hub.
- Downloads last month
- 83
Model tree for akshan-main/termsconditioned-roberta-large-ledgar-lora
Base model
FacebookAI/roberta-largeDatasets used to train akshan-main/termsconditioned-roberta-large-ledgar-lora
Evaluation results
- Accuracy on LEDGAR (LexGLUE)validation set self-reported0.869
- Macro F1 on LEDGAR (LexGLUE)validation set self-reported0.790