botirk's picture
Upload quantized ONNX model
d2ab339 verified
|
raw
history blame
1.83 kB
metadata
license: apache-2.0
language: en
library_name: optimum
tags:
  - onnx
  - quantized
  - text-classification
  - nvidia
  - nemotron
pipeline_tag: text-classification

Quantized ONNX model for botirk/tiny-prompt-task-complexity-classifier

This repository contains the quantized ONNX version of the nvidia/prompt-task-and-complexity-classifier model.

Model Description

This is a multi-headed model which classifies English text prompts across task types and complexity dimensions. This version has been quantized to INT8 using dynamic quantization with the 🤗 Optimum library, resulting in a smaller footprint and faster CPU inference.

For more details on the model architecture, tasks, and complexity dimensions, please refer to the original model card.

How to Use

You can use this model directly with optimum.onnxruntime for accelerated inference.

First, install the required libraries:

pip install optimum[onnxruntime] transformers

Then, you can use the model in a pipeline:

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline

repo_id = "botirk/tiny-prompt-task-complexity-classifier"
model = ORTModelForSequenceClassification.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)

# Note: The pipeline task is a simplification.
# For full multi-headed output, you need to process the logits manually.
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

prompt = "Write a mystery set in a small town where an everyday object goes missing."
results = classifier(prompt)
print(results)