Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
46 posts
Research Agendas
Game Theory
Center on Long-Term Risk (CLR)
Risks of Astronomical Suffering (S-risks)
Mechanism Design
Suffering
Fairness
Blackmail / Extortion
Group Rationality
Terminology / Jargon (meta)
Reading Group
Mind Crime
65 posts
Iterated Amplification
Debate (AI safety technique)
Factored Cognition
Humans Consulting HCH
Ought
Adversarial Collaboration
Delegation
35
My AGI safety research—2022 review, ’23 plans
Steven Byrnes
6d
6
54
«Boundaries», Part 3b: Alignment problems in terms of boundaries
Andrew_Critch
6d
2
231
On how various plans miss the hard bits of the alignment challenge
So8res
5mo
81
155
«Boundaries», Part 1: a key missing concept from utility theory
Andrew_Critch
4mo
26
98
Threat-Resistant Bargaining Megapost: Introducing the ROSE Value
Diffractor
2mo
11
204
Unifying Bargaining Notions (1/2)
Diffractor
4mo
38
155
Some conceptual alignment research projects
Richard_Ngo
3mo
14
120
Unifying Bargaining Notions (2/2)
Diffractor
4mo
11
10
Distilled Representations Research Agenda
Hoagy
2mo
2
1
Announcing: Mechanism Design for AI Safety - Reading Group
Rubi J. Hudson
4mo
3
26
Sections 5 & 6: Contemporary Architectures, Humans in the Loop
JesseClifton
3y
4
38
Resources for AI Alignment Cartography
Gyrodiot
2y
8
44
Technical AGI safety research outside AI
Richard_Ngo
3y
3
41
Formal Open Problem in Decision Theory
Scott Garrabrant
4y
11
47
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Charlie Steiner
8d
14
52
Notes on OpenAI’s alignment plan
Alex Flint
12d
5
85
Rant on Problem Factorization for Alignment
johnswentworth
4mo
48
64
Relaxed adversarial training for inner alignment
evhub
3y
28
118
Paul's research agenda FAQ
zhukeepa
4y
73
35
A Small Negative Result on Debate
Sam Bowman
8mo
11
122
Supervise Process, not Outcomes
stuhlmueller
8mo
8
21
Ought will host a factored cognition “Lab Meeting”
jungofthewon
3mo
1
28
Meta-execution
paulfchristiano
4y
1
46
HCH is not just Mechanical Turk
William_S
3y
6
32
Can HCH epistemically dominate Ramanujan?
zhukeepa
3y
4
28
AI Safety Debate and Its Applications
VojtaKovarik
3y
5
6
Predicting HCH using expert advice
jessicata
6y
0
1
HCH as a measure of manipulation
orthonormal
5y
0