Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
23 posts
Iterated Amplification
14 posts
Humans Consulting HCH
Delegation
52
Notes on OpenAI’s alignment plan
Alex Flint
12d
5
139
Debate update: Obfuscated arguments problem
Beth Barnes
1y
21
124
My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda
Chi Nguyen
2y
21
88
Model splintering: moving from one imperfect model to another
Stuart_Armstrong
2y
10
118
Paul's research agenda FAQ
zhukeepa
4y
73
64
Relaxed adversarial training for inner alignment
evhub
3y
28
58
Directions and desiderata for AI alignment
paulfchristiano
3y
1
38
Synthesizing amplification and debate
evhub
2y
10
46
Machine Learning Projects on IDA
Owain_Evans
3y
3
46
Iterated Distillation and Amplification
Ajeya Cotra
4y
13
41
Preface to the sequence on iterated amplification
paulfchristiano
4y
8
24
How does iterated amplification exceed human abilities?
riceissa
2y
9
37
Thoughts on reward engineering
paulfchristiano
3y
30
45
Understanding Iterated Distillation and Amplification: Claims and Oversight
William_S
4y
30
70
Garrabrant and Shah on human modeling in AGI
Rob Bensinger
1y
10
50
HCH Speculation Post #2A
Charlie Steiner
1y
7
33
Universality Unwrapped
adamShimi
2y
2
46
HCH is not just Mechanical Turk
William_S
3y
6
44
What's wrong with these analogies for understanding Informed Oversight and IDA?
Wei_Dai
3y
3
34
What are the differences between all the iterative/recursive approaches to AI alignment?
riceissa
3y
14
16
Mapping the Conceptual Territory in AI Existential Safety and Alignment
jbkjr
1y
0
35
Towards formalizing universality
paulfchristiano
3y
19
32
Can HCH epistemically dominate Ramanujan?
zhukeepa
3y
4
34
Humans Consulting HCH
paulfchristiano
4y
10
28
Meta-execution
paulfchristiano
4y
1
4
Universality and the “Filter”
maggiehayes
1y
3
6
Predicting HCH using expert advice
jessicata
6y
0
1
HCH as a measure of manipulation
orthonormal
5y
0