ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation Paper • 2511.01163 • Published Nov 3 • 31
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety Paper • 2510.08240 • Published Oct 9 • 41
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks Paper • 2510.02286 • Published Oct 2 • 28
AgentReview: Exploring Peer Review Dynamics with LLM Agents Paper • 2406.12708 • Published Jun 18, 2024 • 8
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published Oct 1 • 58 • 3
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published Oct 1 • 58 • 3
Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion Paper • 2305.03509 • Published May 4, 2023 • 1
RobArch: Designing Robust Architectures against Adversarial Attacks Paper • 2301.03110 • Published Jan 8, 2023 • 1
CompCap: Improving Multimodal Large Language Models with Composite Captions Paper • 2412.05243 • Published Dec 6, 2024 • 20
LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked Paper • 2308.07308 • Published Aug 14, 2023
Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models Paper • 2405.17374 • Published May 27, 2024 • 1
Robust Principles: Architectural Design Principles for Adversarially Robust CNNs Paper • 2308.16258 • Published Aug 30, 2023
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published Oct 1 • 58
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published Oct 1 • 58
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8, 2024 • 172
RobArch: Designing Robust Architectures against Adversarial Attacks Paper • 2301.03110 • Published Jan 8, 2023 • 1
CompCap: Improving Multimodal Large Language Models with Composite Captions Paper • 2412.05243 • Published Dec 6, 2024 • 20