AI Safety

Discipline that studies how to make AI systems reliable, secure, and aligned with human values.

AI safety addresses the technical and social risks of AI: bias, hallucinations, malicious use, and alignment. It includes techniques such as RLHF, red teaming, content moderation, and watermarking.