Intent Classifier - MiniLM

A fine-tuned intent classification model based on MiniLM, optimized for fast inference with multiple ONNX quantization variants.

Model Description

This model is designed for intent classification tasks and has been converted to ONNX format for efficient deployment in various environments, including web browsers using Transformers.js.

Model Variants

This repository contains multiple ONNX model variants optimized for different use cases:

Model File Description Use Case
model.onnx Original ONNX model Best accuracy, larger size
model_fp16.onnx 16-bit floating point Good balance of accuracy and speed
model_int8.onnx 8-bit integer quantized Faster inference, smaller size
model_q4.onnx 4-bit quantized Very fast, very small
model_q4f16.onnx 4-bit with FP16 Optimized for specific hardware
model_quantized.onnx Standard quantized General purpose optimization
model_uint8.onnx Unsigned 8-bit Mobile/edge deployment
model_bnb4.onnx BitsAndBytes 4-bit Advanced quantization

Quick Start

Using with Transformers.js (Browser)

import { pipeline } from '@xenova/transformers';

// Load the model
const classifier = await pipeline('text-classification', 'kousik-2310/intent-classifier-minilm');

// Classify text
const result = await classifier('I want to book a flight to New York');
console.log(result);

Using with Python/Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("kousik-2310/intent-classifier-minilm")
model = AutoModelForSequenceClassification.from_pretrained("kousik-2310/intent-classifier-minilm")

# Create pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Classify text
result = classifier("I want to book a flight to New York")
print(result)

Using ONNX Runtime

import onnxruntime as ort
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("kousik-2310/intent-classifier-minilm")

# Load ONNX model
session = ort.InferenceSession("onnx/model_int8.onnx")

# Tokenize input
text = "I want to book a flight to New York"
inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True)

# Run inference
outputs = session.run(None, {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"]
})

# Process results
predictions = outputs[0]

Model Architecture

  • Base Model: MiniLM architecture
  • Task: Text Classification (Intent Recognition)
  • Framework: PyTorch โ†’ ONNX
  • Quantization: Multiple variants available

Performance

The model provides different performance characteristics based on the variant used:

  • Accuracy: Best with model.onnx, good with quantized versions
  • Speed: Fastest with model_q4.onnx and model_int8.onnx
  • Size: Smallest with quantized variants (4-bit, 8-bit)

Intended Use

This model is intended for:

  • Intent classification in chatbots and virtual assistants
  • Text classification tasks
  • Real-time inference in web applications
  • Edge deployment scenarios

Training Details

The model has been fine-tuned for intent classification and converted to multiple ONNX formats for optimal deployment flexibility.

Limitations and Bias

  • The model performance depends on the similarity between your use case and the training data
  • Quantized models may have slightly reduced accuracy compared to the full precision model
  • Performance may vary based on the deployment environment

How to Cite

@misc{intent-classifier-minilm,
  title={Intent Classifier MiniLM},
  author={kousik-2310},
  year={2024},
  url={https://huggingface.co/kousik-2310/intent-classifier-minilm}
}

License

This model is released under the Apache 2.0 License.

Downloads last month
27
Safetensors
Model size
22.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kousik-2310/intent-classifier-minilm

Quantized
(1)
this model