Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
28 posts
Debate (AI safety technique)
Factored Cognition
Ought
Adversarial Collaboration
37 posts
Iterated Amplification
Humans Consulting HCH
Delegation
36
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Charlie Steiner
8d
14
73
Rant on Problem Factorization for Alignment
johnswentworth
4mo
48
42
A Small Negative Result on Debate
Sam Bowman
8mo
11
118
Supervise Process, not Outcomes
stuhlmueller
8mo
8
35
Ought will host a factored cognition “Lab Meeting”
jungofthewon
3mo
1
36
AI Safety Debate and Its Applications
VojtaKovarik
3y
5
94
Writeup: Progress on AI Safety via Debate
Beth Barnes
2y
18
45
Factored Cognition
stuhlmueller
4y
6
16
Traversing a Cognition Space
Rafael Harth
2y
5
48
Vaniver's View on Factored Cognition
Vaniver
3y
4
92
Imitative Generalisation (AKA 'Learning the Prior')
Beth Barnes
1y
14
73
Why I'm excited about Debate
Richard_Ngo
1y
12
32
New paper: (When) is Truth-telling Favored in AI debate?
VojtaKovarik
2y
7
34
Idealized Factored Cognition
Rafael Harth
2y
6
47
Notes on OpenAI’s alignment plan
Alex Flint
12d
5
61
Relaxed adversarial training for inner alignment
evhub
3y
28
125
Paul's research agenda FAQ
zhukeepa
4y
73
20
Meta-execution
paulfchristiano
4y
1
41
HCH is not just Mechanical Turk
William_S
3y
6
34
Can HCH epistemically dominate Ramanujan?
zhukeepa
3y
4
7
Predicting HCH using expert advice
jessicata
6y
0
1
HCH as a measure of manipulation
orthonormal
5y
0
24
Reliability amplification
paulfchristiano
3y
3
21
Approval-directed bootstrapping
paulfchristiano
4y
0
30
Approval-directed agents
paulfchristiano
4y
11
42
Preface to the sequence on iterated amplification
paulfchristiano
4y
8
15
Mapping the Conceptual Territory in AI Existential Safety and Alignment
jbkjr
1y
0
45
Iterated Distillation and Amplification
Ajeya Cotra
4y
13