Privacy Clause Classifier (DistilBERT - OPP-115)
This model is a fine-tuned DistilBERT model designed to classify privacy policy clauses into one of the predefined privacy practices based on the OPP-115 dataset.
ID | Category |
---|---|
0 | Data Retention |
1 | Data Security |
2 | Do Not Track |
3 | First Party Collection/Use |
4 | International and Specific Audiences |
5 | Other |
6 | Policy Change |
7 | Third Party Sharing/Collection |
8 | User Access, Edit and Deletion |
9 | User Choice/Control |
Model Details
- Architecture: DistilBERT (pretrained)
- Fine-tuning Dataset: OPP-115 Dataset
- Input Format: Text snippets from privacy policies
- Output Format: Predicted class label with probabilities
Intended Uses
- Automatic privacy policy clause classification
- Regulatory technology (RegTech) tools
- Privacy policy summarization and simplification
- Risk analysis for data sharing and collection practices
How to Use
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch
# Load model
tokenizer = DistilBertTokenizerFast.from_pretrained("your-hf-username/your-model-name")
model = DistilBertForSequenceClassification.from_pretrained("your-hf-username/your-model-name")
# Predict
text = "We may collect your location data to provide customized services."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=-1).item()
print(f"Predicted Category: {predicted_class}")
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support