TermsConditioned – RoBERTa-large LEDGAR + LoRA

A RoBERTa-large encoder, fine-tuned with LoRA on the LEDGAR subset of LexGLUE to classify contract paragraphs into 100 clause families, with an explicit risk bucket and slice-level governance analysis.

This repo only contains the adapter weights + tokenizer, not the full base model.
To use it, you must load roberta-large from Hugging Face and then apply these LoRA adapters.

You cannot AutoModelForSequenceClassification.from_pretrained("akshan-main/…")

1. What this model does

Input: a single contract paragraph (e.g., ToS, MSA, clickwrap clause).
Output: one of 100 LEDGAR clause families (e.g., Arbitration, Governing Laws, Indemnity, Limitation Of Liability, Amendments, etc.).

2. Intended use

2.1. Primary use case

This model is designed to be part of a Terms & Conditions / contract intake triage tool that:

Splits a document into paragraphs.
Runs this classifier on each paragraph.
Applies a policy over the probabilities:
- high-confidence risky clause → “Flag as risky”
- high-confidence non-risky clause → “Green-light”
- low-confidence → “Needs review” (abstain)

2.2. Non-goals

Not legal advice.
Not guaranteed fair / non-biased for every jurisdiction or contract type.
Not designed to replace full contract review or negotiation tools.

3. Training data

Dataset: coastalcph/lex_glue (LEDGAR split)
Train / Validation / Test:
- train: 60,000 paragraphs
- validation: 10,000 paragraphs
- test: 10,000 paragraphs
Labels: 100 clause families as defined in LEDGAR.

Each example is a single paragraph of a contract, labeled with exactly one family.

4. Model architecture & fine-tuning

4.1 Base model

roberta-large from Hugging Face (transformers).

4.2 LoRA setup

We apply LoRA to a subset of the encoder:

Target modules: query, key, value, intermediate.dense, output.dense
LoRA config:
- r = 16
- lora_alpha = 32
- lora_dropout = 0.05
Frozen: All other base model weights.
Saved extra modules: classifier head kept and saved along with adapters.

4.3 Optimization & training

Objective: weighted cross-entropy with class weights to counter label imbalance.
Optimizer: AdamW (8-bit or standard), weight decay 0.1
Scheduler: cosine LR with warmup
Batch size (effective): 32 (per-device × grad_accumulation)
Epochs: 5
Max seq length: 384 tokens
Hardware: single GPU (tested on A100)

Reproducibility knobs:

Fixed random seed (42) for Python / NumPy / PyTorch.
Deterministic behavior is not fully guaranteed but training is stable.

5. Evaluation

All numbers below are on the validation split (10,000 paragraphs) with the LoRA adapters applied.

5.1 Standard metrics

Accuracy: ~0.869
Macro F1: ~0.790

This is a multi-class setting with 100 labels and notable class imbalance.

5.2 Calibration

On top of logits, we apply temperature scaling:

Search over a grid of temperatures.
Best temperature on validation: T* ≈ 0.8
Expected Calibration Error (ECE) before / after scaling:
- ECE_raw ≈ 0.115
- ECE_cal ≈ 0.022

These calibrated probabilities are what we use for governance policies (false-green caps, abstain band, etc.).

6. Inference: using the model

Load base + adapters


from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel

BASE = "roberta-large"
ADAPTER_REPO = "akshan-main/termsconditioned-roberta-large-ledgar-lora"

tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForSequenceClassification.from_pretrained(
    BASE,
    num_labels=100,
)
model = PeftModel.from_pretrained(base_model, ADAPTER_REPO)

model.eval()

You can test the model on synthetic or real ToS paragraphs (for example, arbitration clauses, limitation of liability caps, or indemnity language)

# Must run the above cell first
# This cell is a sample use case for the model
text = "Any dispute arising out of or relating to this Agreement shall be finally settled by binding arbitration..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=384)

with torch.no_grad():
    outputs = model(**inputs)
    probs = outputs.logits.softmax(dim=-1)[0]

topk = torch.topk(probs, k=5)
for idx, score in zip(topk.indices.tolist(), topk.values.tolist()):
    print(idx, float(score))

7. Limitations and warnings

Domain

The model is trained on LEDGAR (public contract clauses). Behavior on consumer terms of service, privacy policies, employment agreements, or narrow industry contracts may differ. You should re-check performance on your own corpus.

Single label per paragraph

The dataset assumes one dominant clause family per paragraph. Real-world paragraphs can mix multiple concerns (for example, arbitration plus waiver of class actions). Treat the prediction as the "primary" family, not an exhaustive tagging of everything risky in the text.

Language

Training data is English-only; performance on other languages is not characterized.

Legal risk

This model is not legal advice. Any production use should keep a human in the loop.

8. How to cite or reference

If you use this model in a writeup, you can cite it as:

@misc{akshan_krithick_2025,
    author       = { Akshan Krithick },
    title        = { termsconditioned-roberta-large-ledgar-lora (Revision 1605a22) },
    year         = 2025,
    url          = { https://huggingface.co/akshan-main/termsconditioned-roberta-large-ledgar-lora },
    doi          = { 10.57967/hf/7109 },
    publisher    = { Hugging Face }
}

9. Files in this repo

adapter_model.safetensors – LoRA adapter weights for the classifier head and selected encoder modules
adapter_config.json – PEFT / LoRA configuration
config.json – model configuration (num_labels, id2label, label2id, etc.)
tokenizer.json, vocab.json, merges.txt, tokenizer_config.json – tokenizer assets compatible with roberta-large
special_tokens_map.json – tokenizer special token mapping
training checkpoints
README.md

The base roberta-large weights are not duplicated here; at inference time they are loaded from the main Hugging Face model hub.

Downloads last month: 83

Model tree for akshan-main/termsconditioned-roberta-large-ledgar-lora

Base model

FacebookAI/roberta-large

Finetuned

(426)

this model

Datasets used to train akshan-main/termsconditioned-roberta-large-ledgar-lora

Evaluation results

Accuracy on LEDGAR (LexGLUE)
validation set self-reported

0.869
Macro F1 on LEDGAR (LexGLUE)
validation set self-reported

0.790

View on Papers With Code