Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Thatscientist 's Collections
TTS
BigWords
Information Science
AI safety
Efficiency
Memory
Medical application
Money
Model ethics
TTI

AI safety

updated Nov 4, 2023
Upvote
-

  • Safe RLHF: Safe Reinforcement Learning from Human Feedback

    Paper • 2310.12773 • Published Oct 19, 2023 • 28

  • The Generative AI Paradox: "What It Can Create, It May Not Understand"

    Paper • 2311.00059 • Published Oct 31, 2023 • 20

  • LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B

    Paper • 2310.20624 • Published Oct 31, 2023 • 13

  • Moral Foundations of Large Language Models

    Paper • 2310.15337 • Published Oct 23, 2023 • 1
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs