Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security Paper • 2507.19399 • Published 7 days ago • 1 • 2
RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages Paper • 2507.05980 • Published 24 days ago • 1 • 1
MinorBench: A hand-built benchmark for content-based risks for children Paper • 2503.10242 • Published Mar 13 • 5 • 3
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection Paper • 2411.12946 • Published Nov 20, 2024 • 23 • 2