arxiv:2404.00629

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

Published on Mar 31, 2024

Authors:

Abstract

A survey examines fine-grained attack strategies and defenses for generative models, introducing a taxonomy and framework for automatic red teaming, and addressing emerging topics like multimodal attacks and LLM-based agents.

AI-generated summary

Generative models are rapidly gaining popularity and being integrated into everyday applications, raising concerns over their safe use as various vulnerabilities are exposed. In light of this, the field of red teaming is undergoing fast-paced growth, highlighting the need for a comprehensive survey covering the entire pipeline and addressing emerging topics. Our extensive survey, which examines over 120 papers, introduces a taxonomy of fine-grained attack strategies grounded in the inherent capabilities of language models. Additionally, we have developed the "searcher" framework to unify various automatic red teaming approaches. Moreover, our survey covers novel areas including multimodal attacks and defenses, risks around LLM-based agents, overkill of harmless queries, and the balance between harmlessness and helpfulness.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2404.00629 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2404.00629 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.