metadata
license: apache-2.0
language: en
library_name: optimum
tags:
- onnx
- quantized
- text-classification
- nvidia
- nemotron
pipeline_tag: text-classification
Quantized ONNX model for botirk/tiny-prompt-task-complexity-classifier
This repository contains the quantized ONNX version of the nvidia/prompt-task-and-complexity-classifier model.
Model Description
This is a multi-headed model which classifies English text prompts across task types and complexity dimensions. This version has been quantized to INT8
using dynamic quantization with the 🤗 Optimum library, resulting in a smaller footprint and faster CPU inference.
For more details on the model architecture, tasks, and complexity dimensions, please refer to the original model card.
How to Use
You can use this model directly with optimum.onnxruntime
for accelerated inference.
First, install the required libraries:
pip install optimum[onnxruntime] transformers
Then, you can use the model in a pipeline:
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline
repo_id = "botirk/tiny-prompt-task-complexity-classifier"
model = ORTModelForSequenceClassification.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
# Note: The pipeline task is a simplification.
# For full multi-headed output, you need to process the logits manually.
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
prompt = "Write a mystery set in a small town where an everyday object goes missing."
results = classifier(prompt)
print(results)