Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
28 posts
Debate (AI safety technique)
Factored Cognition
Ought
Adversarial Collaboration
37 posts
Iterated Amplification
Humans Consulting HCH
Delegation
122
Supervise Process, not Outcomes
stuhlmueller
8mo
8
106
Imitative Generalisation (AKA 'Learning the Prior')
Beth Barnes
1y
14
97
Writeup: Progress on AI Safety via Debate
Beth Barnes
2y
18
85
Rant on Problem Factorization for Alignment
johnswentworth
4mo
48
81
Ought: why it matters and ways to help
paulfchristiano
3y
7
80
Why I'm excited about Debate
Richard_Ngo
1y
12
66
A guide to Iterated Amplification & Debate
Rafael Harth
2y
10
65
Vaniver's View on Factored Cognition
Vaniver
3y
4
65
Looking for adversarial collaborators to test our Debate protocol
Beth Barnes
2y
5
61
How should AI debate be judged?
abramdemski
2y
27
47
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Charlie Steiner
8d
14
47
Debate Minus Factored Cognition
abramdemski
1y
42
45
Preface to the Sequence on Factored Cognition
Rafael Harth
2y
7
42
Idealized Factored Cognition
Rafael Harth
2y
6
139
Debate update: Obfuscated arguments problem
Beth Barnes
1y
21
124
My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda
Chi Nguyen
2y
21
118
Paul's research agenda FAQ
zhukeepa
4y
73
88
Model splintering: moving from one imperfect model to another
Stuart_Armstrong
2y
10
70
Garrabrant and Shah on human modeling in AGI
Rob Bensinger
1y
10
64
Relaxed adversarial training for inner alignment
evhub
3y
28
58
Directions and desiderata for AI alignment
paulfchristiano
3y
1
52
Notes on OpenAI’s alignment plan
Alex Flint
12d
5
50
HCH Speculation Post #2A
Charlie Steiner
1y
7
46
HCH is not just Mechanical Turk
William_S
3y
6
46
Iterated Distillation and Amplification
Ajeya Cotra
4y
13
46
Machine Learning Projects on IDA
Owain_Evans
3y
3
45
Understanding Iterated Distillation and Amplification: Claims and Oversight
William_S
4y
30
44
What's wrong with these analogies for understanding Informed Oversight and IDA?
Wei_Dai
3y
3