arXiv:2311.01605

Faithful and Robust Local Interpretability for Textual Predictions

Published on Oct 30, 2023

Authors:

Gianluigi Lopardo ,

Abstract

FRED is a novel method for interpreting text models by identifying key words, assigning importance scores, and generating counterfactual examples, demonstrating reliability and effectiveness.

AI-generated summary

Interpretability is essential for machine learning models to be trusted and deployed in critical domains. However, existing methods for interpreting text models are often complex, lack mathematical foundations, and their performance is not guaranteed. In this paper, we propose FRED (Faithful and Robust Explainer for textual Documents), a novel method for interpreting predictions over text. FRED offers three key insights to explain a model prediction: (1) it identifies the minimal set of words in a document whose removal has the strongest influence on the prediction, (2) it assigns an importance score to each token, reflecting its influence on the model's output, and (3) it provides counterfactual explanations by generating examples similar to the original document, but leading to a different prediction. We establish the reliability of FRED through formal definitions and theoretical analyses on interpretable classifiers. Additionally, our empirical evaluation against state-of-the-art methods demonstrates the effectiveness of FRED in providing insights into text models.

View arXiv page View PDF GitHub 7 Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2311.01605 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2311.01605 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2311.01605 in a Space README.md to link it from this page.