Privacy Clause Classifier (DistilBERT - OPP-115)

This model is a fine-tuned DistilBERT model designed to classify privacy policy clauses into one of the predefined privacy practices based on the OPP-115 dataset.

ID Category
0 Data Retention
1 Data Security
2 Do Not Track
3 First Party Collection/Use
4 International and Specific Audiences
5 Other
6 Policy Change
7 Third Party Sharing/Collection
8 User Access, Edit and Deletion
9 User Choice/Control

Model Details

  • Architecture: DistilBERT (pretrained)
  • Fine-tuning Dataset: OPP-115 Dataset
  • Input Format: Text snippets from privacy policies
  • Output Format: Predicted class label with probabilities

Intended Uses

  • Automatic privacy policy clause classification
  • Regulatory technology (RegTech) tools
  • Privacy policy summarization and simplification
  • Risk analysis for data sharing and collection practices

How to Use

from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch

# Load model
tokenizer = DistilBertTokenizerFast.from_pretrained("your-hf-username/your-model-name")
model = DistilBertForSequenceClassification.from_pretrained("your-hf-username/your-model-name")

# Predict
text = "We may collect your location data to provide customized services."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=-1).item()

print(f"Predicted Category: {predicted_class}")
Downloads last month
2
Safetensors
Model size
67M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support