Intent Classifier - MiniLM
A fine-tuned intent classification model based on MiniLM, optimized for fast inference with multiple ONNX quantization variants.
Model Description
This model is designed for intent classification tasks and has been converted to ONNX format for efficient deployment in various environments, including web browsers using Transformers.js.
Model Variants
This repository contains multiple ONNX model variants optimized for different use cases:
Model File | Description | Use Case |
---|---|---|
model.onnx |
Original ONNX model | Best accuracy, larger size |
model_fp16.onnx |
16-bit floating point | Good balance of accuracy and speed |
model_int8.onnx |
8-bit integer quantized | Faster inference, smaller size |
model_q4.onnx |
4-bit quantized | Very fast, very small |
model_q4f16.onnx |
4-bit with FP16 | Optimized for specific hardware |
model_quantized.onnx |
Standard quantized | General purpose optimization |
model_uint8.onnx |
Unsigned 8-bit | Mobile/edge deployment |
model_bnb4.onnx |
BitsAndBytes 4-bit | Advanced quantization |
Quick Start
Using with Transformers.js (Browser)
import { pipeline } from '@xenova/transformers';
// Load the model
const classifier = await pipeline('text-classification', 'kousik-2310/intent-classifier-minilm');
// Classify text
const result = await classifier('I want to book a flight to New York');
console.log(result);
Using with Python/Transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("kousik-2310/intent-classifier-minilm")
model = AutoModelForSequenceClassification.from_pretrained("kousik-2310/intent-classifier-minilm")
# Create pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
# Classify text
result = classifier("I want to book a flight to New York")
print(result)
Using ONNX Runtime
import onnxruntime as ort
from transformers import AutoTokenizer
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("kousik-2310/intent-classifier-minilm")
# Load ONNX model
session = ort.InferenceSession("onnx/model_int8.onnx")
# Tokenize input
text = "I want to book a flight to New York"
inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True)
# Run inference
outputs = session.run(None, {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
})
# Process results
predictions = outputs[0]
Model Architecture
- Base Model: MiniLM architecture
- Task: Text Classification (Intent Recognition)
- Framework: PyTorch โ ONNX
- Quantization: Multiple variants available
Performance
The model provides different performance characteristics based on the variant used:
- Accuracy: Best with
model.onnx
, good with quantized versions - Speed: Fastest with
model_q4.onnx
andmodel_int8.onnx
- Size: Smallest with quantized variants (4-bit, 8-bit)
Intended Use
This model is intended for:
- Intent classification in chatbots and virtual assistants
- Text classification tasks
- Real-time inference in web applications
- Edge deployment scenarios
Training Details
The model has been fine-tuned for intent classification and converted to multiple ONNX formats for optimal deployment flexibility.
Limitations and Bias
- The model performance depends on the similarity between your use case and the training data
- Quantized models may have slightly reduced accuracy compared to the full precision model
- Performance may vary based on the deployment environment
How to Cite
@misc{intent-classifier-minilm,
title={Intent Classifier MiniLM},
author={kousik-2310},
year={2024},
url={https://huggingface.co/kousik-2310/intent-classifier-minilm}
}
License
This model is released under the Apache 2.0 License.
- Downloads last month
- 34
Model tree for kousik-2310/intent-classifier-minilm
Base model
microsoft/DialoGPT-medium