On Evaluating the Durability of Safeguards for Open-Weight LLMs Paper • 2412.07097 • Published Dec 10, 2024 • 1
Dynamic Risk Assessments for Offensive Cybersecurity Agents Paper • 2505.18384 • Published May 23 • 8
Evaluating Copyright Takedown Methods for Language Models Paper • 2406.18664 • Published Jun 26, 2024 • 1
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications Paper • 2402.05162 • Published Feb 7, 2024 • 1