Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
46 posts
Research Agendas
Game Theory
Center on Long-Term Risk (CLR)
Risks of Astronomical Suffering (S-risks)
Mechanism Design
Suffering
Fairness
Blackmail / Extortion
Group Rationality
Terminology / Jargon (meta)
Reading Group
Mind Crime
65 posts
Iterated Amplification
Debate (AI safety technique)
Factored Cognition
Humans Consulting HCH
Ought
Adversarial Collaboration
Delegation
231
On how various plans miss the hard bits of the alignment challenge
So8res
5mo
81
204
Unifying Bargaining Notions (1/2)
Diffractor
4mo
38
155
Some conceptual alignment research projects
Richard_Ngo
3mo
14
155
«Boundaries», Part 1: a key missing concept from utility theory
Andrew_Critch
4mo
26
131
"Zero Sum" is a misnomer.
abramdemski
2y
35
127
Thoughts on Human Models
Ramana Kumar
3y
32
122
Our take on CHAI’s research agenda in under 1500 words
Alex Flint
2y
19
120
Unifying Bargaining Notions (2/2)
Diffractor
4mo
11
102
What counts as defection?
TurnTrout
2y
21
101
The Commitment Races problem
Daniel Kokotajlo
3y
39
98
Threat-Resistant Bargaining Megapost: Introducing the ROSE Value
Diffractor
2mo
11
86
When Hindsight Isn't 20/20: Incentive Design With Imperfect Credit Allocation
johnswentworth
2y
24
82
Research Agenda v0.9: Synthesising a human's preferences into a utility function
Stuart_Armstrong
3y
25
79
Siren worlds and the perils of over-optimised search
Stuart_Armstrong
8y
417
139
Debate update: Obfuscated arguments problem
Beth Barnes
1y
21
124
My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda
Chi Nguyen
2y
21
122
Supervise Process, not Outcomes
stuhlmueller
8mo
8
118
Paul's research agenda FAQ
zhukeepa
4y
73
106
Imitative Generalisation (AKA 'Learning the Prior')
Beth Barnes
1y
14
97
Writeup: Progress on AI Safety via Debate
Beth Barnes
2y
18
88
Model splintering: moving from one imperfect model to another
Stuart_Armstrong
2y
10
85
Rant on Problem Factorization for Alignment
johnswentworth
4mo
48
81
Ought: why it matters and ways to help
paulfchristiano
3y
7
80
Why I'm excited about Debate
Richard_Ngo
1y
12
70
Garrabrant and Shah on human modeling in AGI
Rob Bensinger
1y
10
66
A guide to Iterated Amplification & Debate
Rafael Harth
2y
10
65
Vaniver's View on Factored Cognition
Vaniver
3y
4
65
Looking for adversarial collaborators to test our Debate protocol
Beth Barnes
2y
5