Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
69 posts
Debate (AI safety technique)
Factored Cognition
Experiments
Ought
AI-assisted Alignment
Memory and Mnemonics
Air Conditioning
43 posts
Iterated Amplification
Humans Consulting HCH
36
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Charlie Steiner
8d
14
151
Godzilla Strategies
johnswentworth
6mo
65
7
Alignment with argument-networks and assessment-predictions
Tor Økland Barstad
7d
3
16
Research request (alignment strategy): Deep dive on "making AI solve alignment for us"
JanBrauner
19d
3
73
Rant on Problem Factorization for Alignment
johnswentworth
4mo
48
10
Provably Honest - A First Step
Srijanak De
1mo
2
27
AI Safety via Debate
ESRogs
4y
13
9
Getting from an unaligned AGI to an aligned AGI?
Tor Økland Barstad
6mo
7
80
Air Conditioner Test Results & Discussion
johnswentworth
6mo
38
8
AI-assisted list of ten concrete alignment things to do right now
lcmgcd
3mo
5
118
Supervise Process, not Outcomes
stuhlmueller
8mo
8
92
Beliefs and Disagreements about Automating Alignment Research
Ian McKenzie
3mo
4
20
Briefly thinking through some analogs of debate
Eli Tyre
3mo
3
14
Making it harder for an AGI to "trick" us, with STVs
Tor Økland Barstad
5mo
5
47
Notes on OpenAI’s alignment plan
Alex Flint
12d
5
61
Relaxed adversarial training for inner alignment
evhub
3y
28
125
Debate update: Obfuscated arguments problem
Beth Barnes
1y
21
125
Paul's research agenda FAQ
zhukeepa
4y
73
20
Meta-execution
paulfchristiano
4y
1
41
HCH is not just Mechanical Turk
William_S
3y
6
26
The reward engineering problem
paulfchristiano
3y
3
34
Can HCH epistemically dominate Ramanujan?
zhukeepa
3y
4
7
Predicting HCH using expert advice
jessicata
6y
0
1
HCH as a measure of manipulation
orthonormal
5y
0
24
Reliability amplification
paulfchristiano
3y
3
21
Approval-directed bootstrapping
paulfchristiano
4y
0
30
Approval-directed agents
paulfchristiano
4y
11
16
Epistemology of HCH
adamShimi
1y
2