The Alignment Landscape

The Prompt

Map the major approaches to AI alignment — including RLHF, constitutional AI, inverse reward design, cooperative inverse reinforcement learning, and debate-based approaches — tracing their intellectual lineage from Stuart Russell's reformulation of the AI problem through the founding of MIRI, the establishment of safety teams at Anthropic, DeepMind, and OpenAI, to the present. For each approach, explain what problem it addresses and what problem it leaves open.

Completion Criteria

A structured survey covering at least four distinct alignment approaches, each with its intellectual origin, the specific failure mode it addresses, and an honest statement of its known limitations.

Claim this quest

Students who have attested

No Students have attested this quest yet. Be the first.