Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
26 posts
Iterated Amplification
17 posts
Humans Consulting HCH
47
Notes on OpenAI’s alignment plan
Alex Flint
12d
5
61
Relaxed adversarial training for inner alignment
evhub
3y
28
125
Debate update: Obfuscated arguments problem
Beth Barnes
1y
21
125
Paul's research agenda FAQ
zhukeepa
4y
73
26
The reward engineering problem
paulfchristiano
3y
3
24
Reliability amplification
paulfchristiano
3y
3
21
Approval-directed bootstrapping
paulfchristiano
4y
0
30
Approval-directed agents
paulfchristiano
4y
11
20
Explanation of Paul's AI-Alignment agenda by Ajeya Cotra
habryka
4y
0
42
Preface to the sequence on iterated amplification
paulfchristiano
4y
8
45
Iterated Distillation and Amplification
Ajeya Cotra
4y
13
17
Amplification Discussion Notes
William_S
4y
3
15
Benign model-free RL
paulfchristiano
4y
1
119
My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda
Chi Nguyen
2y
21
20
Meta-execution
paulfchristiano
4y
1
41
HCH is not just Mechanical Turk
William_S
3y
6
34
Can HCH epistemically dominate Ramanujan?
zhukeepa
3y
4
7
Predicting HCH using expert advice
jessicata
6y
0
1
HCH as a measure of manipulation
orthonormal
5y
0
16
Epistemology of HCH
adamShimi
1y
2
15
Mapping the Conceptual Territory in AI Existential Safety and Alignment
jbkjr
1y
0
35
What's wrong with these analogies for understanding Informed Oversight and IDA?
Wei_Dai
3y
3
27
Towards formalizing universality
paulfchristiano
3y
19
42
HCH Speculation Post #2A
Charlie Steiner
1y
7
32
Humans Consulting HCH
paulfchristiano
4y
10
47
Relating HCH and Logical Induction
abramdemski
2y
4
10
Universality and the “Filter”
maggiehayes
1y
3
57
Garrabrant and Shah on human modeling in AGI
Rob Bensinger
1y
10