KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge Paper • 2402.13605 • Published Feb 21, 2024
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application Paper • 2305.17701 • Published May 28, 2023 • 1
ProPILE: Probing Privacy Leakage in Large Language Models Paper • 2307.01881 • Published Jul 4, 2023 • 1
TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification Paper • 2402.12991 • Published Feb 20, 2024
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models Paper • 2310.08491 • Published Oct 12, 2023 • 55