Optimum documentation
Optimization
Optimization
🤗 Optimum provides an optimum.onnxruntime package that enables you to apply graph optimization on many model hosted on the 🤗 hub using the ONNX Runtime model optimization tool.
Creating an ORTOptimizer
The ORTOptimizer class is used to optimize your ONNX model. The class can be initialized using the from_pretrained() method, which supports different checkpoint formats.
- Using an already initialized
ORTModelForXXXclass.
>>> from optimum.onnxruntime import ORTOptimizer, ORTModelForSequenceClassification
# Loading ONNX Model from the Hub
>>> model = ORTModelForSequenceClassification.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english")
# Create an optimizer from an ORTModelForXXX
>>> optimizer = ORTOptimizer.from_pretrained(model)- Using a local ONNX model from a directory.
>>> from optimum.onnxruntime import ORTOptimizer
# This assumes a model.onnx exists in path/to/model
>>> optimizer = ORTOptimizer.from_pretrained("path/to/model")Optimization examples
Below you will find an easy end-to-end example on how to optimize distilbert-base-uncased-finetuned-sst-2-english.
>>> from optimum.onnxruntime import ORTOptimizer, ORTModelForSequenceClassification
>>> from optimum.onnxruntime.configuration import OptimizationConfig
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> save_dir = "/tmp/outputs"
# Load a PyTorch model and export it to the ONNX format
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, from_transformers=True)
# Create the optimizer
>>> optimizer = ORTOptimizer.from_pretrained(model)
# Define the optimization strategy by creating the appropriate configuration
>>> optimization_config = OptimizationConfig(
optimization_level=2,
optimize_with_onnxruntime_only=False,
optimize_for_gpu=False,
)
# Optimize the model
>>> optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)Below you will find an easy end-to-end example on how to optimize a Seq2Seq model sshleifer/distilbart-cnn-12-6”.
>>> from optimum.onnxruntime import ORTOptimizer, ORTModelForSeq2SeqLM
>>> from optimum.onnxruntime.configuration import OptimizationConfig
>>> from transformers import AutoTokenizer
>>> model_id = "sshleifer/distilbart-cnn-12-6"
>>> save_dir = "/tmp/outputs"
# Load a PyTorch model and export it to the ONNX format
>>> model = ORTModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
# Create the optimizer
>>> optimizer = ORTOptimizer.from_pretrained(model)
# Define the optimization strategy by creating the appropriate configuration
>>> optimization_config = OptimizationConfig(
optimization_level=2,
optimize_with_onnxruntime_only=False,
optimize_for_gpu=False,
)
# Optimize the model
>>> optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)
# Load the resulting optimized model
>>> optimized_model = ORTModelForSeq2SeqLM.from_pretrained(
save_dir,
encoder_file_name="encoder_model_optimized.onnx",
decoder_file_name="decoder_model_optimized.onnx",
decoder_file_with_past_name="decoder_with_past_model_optimized.onnx",
)
>>> tokenizer = AutoTokenizer.from_pretrained(model_id)
>>> tokens = tokenizer("This is a sample input", return_tensors="pt")
>>> outputs = optimized_model.generate(**tokens)ORTOptimizer
class optimum.onnxruntime.ORTOptimizer
< source >( onnx_model_path: typing.List[os.PathLike] config: PretrainedConfig )
Handles the ONNX Runtime optimization process for models shared on huggingface.co/models.
from_pretrained
< source >( model_or_path: typing.Union[str, os.PathLike, optimum.onnxruntime.modeling_ort.ORTModel] file_names: typing.Optional[typing.List[str]] = None )
Parameters
-
model_or_path (
Union[str, os.PathLike, ORTModel]) — The path to a local directory hosting the model to optimize or an instance of anORTModelto quantize. Can be either:- A path to a local directory containing the model to optimize.
- An instance of ORTModel.
-
file_names(
List[str], optional) — The list of file names of the models to optimize.
get_fused_operators
< source >( onnx_model_path: typing.Union[str, os.PathLike] )
Compute the dictionary mapping the name of the fused operators to their number of apparition in the model.
get_nodes_number_difference
< source >( onnx_model_path: typing.Union[str, os.PathLike] onnx_optimized_model_path: typing.Union[str, os.PathLike] )
Compute the difference in the number of nodes between the original and the optimized model.
get_operators_difference
< source >( onnx_model_path: typing.Union[str, os.PathLike] onnx_optimized_model_path: typing.Union[str, os.PathLike] )
Compute the dictionary mapping the operators name to the difference in the number of corresponding nodes between the original and the optimized model.
optimize
< source >( optimization_config: OptimizationConfig save_dir: typing.Union[str, os.PathLike] file_suffix: typing.Optional[str] = 'optimized' use_external_data_format: bool = False )
Parameters
-
optimization_config (
OptimizationConfig) — The configuration containing the parameters related to optimization. -
save_dir (
Union[str, os.PathLike]) — The path used to save the optimized model. -
file_suffix (
str, optional, defaults to"optimized") — The file suffix used to save the optimized model. -
use_external_data_format (
bool, optional, defaults toFalse) — Whether to use external data format to store model of size >= 2Gb.
Optimize a model given the optimization specifications defined in optimization_config.