Vai al contenuto

AI Safety

Discipline that studies how to make AI systems reliable, secure, and aligned with human values.

AI safety addresses the technical and social risks of AI: bias, hallucinations, malicious use, and alignment. It includes techniques such as RLHF, red teaming, content moderation, and watermarking.

Practical examples

  • Content filters on ChatGPT
  • Red teaming for jailbreaks
  • AI image watermarking
  • Bias audits in models

Related terms