Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts Paper • 2402.16822 • Published Feb 26, 2024 • 18
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations Paper • 2312.06674 • Published Dec 7, 2023 • 8
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming Paper • 2311.07689 • Published Nov 13, 2023 • 9
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 245
Residual Prompt Tuning: Improving Prompt Tuning with Residual Reparameterization Paper • 2305.03937 • Published May 6, 2023 • 2