Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
69 posts
Debate (AI safety technique)
Factored Cognition
Experiments
Ought
AI-assisted Alignment
Memory and Mnemonics
Air Conditioning
43 posts
Iterated Amplification
Humans Consulting HCH
46
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Charlie Steiner
8d
14
118
Godzilla Strategies
johnswentworth
6mo
65
3
Alignment with argument-networks and assessment-predictions
Tor Økland Barstad
7d
3
11
Research request (alignment strategy): Deep dive on "making AI solve alignment for us"
JanBrauner
19d
3
84
Rant on Problem Factorization for Alignment
johnswentworth
4mo
48
3
Provably Honest - A First Step
Srijanak De
1mo
2
34
AI Safety via Debate
ESRogs
4y
13
5
Getting from an unaligned AGI to an aligned AGI?
Tor Økland Barstad
6mo
7
80
Air Conditioner Test Results & Discussion
johnswentworth
6mo
38
4
AI-assisted list of ten concrete alignment things to do right now
lcmgcd
3mo
5
116
Supervise Process, not Outcomes
stuhlmueller
8mo
8
79
Beliefs and Disagreements about Automating Alignment Research
Ian McKenzie
3mo
4
27
Briefly thinking through some analogs of debate
Eli Tyre
3mo
3
0
Making it harder for an AGI to "trick" us, with STVs
Tor Økland Barstad
5mo
5
50
Notes on OpenAI’s alignment plan
Alex Flint
12d
5
61
Relaxed adversarial training for inner alignment
evhub
3y
28
132
Debate update: Obfuscated arguments problem
Beth Barnes
1y
21
111
Paul's research agenda FAQ
zhukeepa
4y
73
27
Meta-execution
paulfchristiano
4y
1
45
HCH is not just Mechanical Turk
William_S
3y
6
28
The reward engineering problem
paulfchristiano
3y
3
31
Can HCH epistemically dominate Ramanujan?
zhukeepa
3y
4
6
Predicting HCH using expert advice
jessicata
6y
0
1
HCH as a measure of manipulation
orthonormal
5y
0
27
Reliability amplification
paulfchristiano
3y
3
28
Approval-directed bootstrapping
paulfchristiano
4y
0
30
Approval-directed agents
paulfchristiano
4y
11
20
Epistemology of HCH
adamShimi
1y
2