Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
69 posts
Debate (AI safety technique)
Factored Cognition
Experiments
Ought
AI-assisted Alignment
Memory and Mnemonics
Air Conditioning
43 posts
Iterated Amplification
Humans Consulting HCH
26
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Charlie Steiner
8d
14
184
Godzilla Strategies
johnswentworth
6mo
65
11
Alignment with argument-networks and assessment-predictions
Tor Økland Barstad
7d
3
21
Research request (alignment strategy): Deep dive on "making AI solve alignment for us"
JanBrauner
19d
3
62
Rant on Problem Factorization for Alignment
johnswentworth
4mo
48
17
Provably Honest - A First Step
Srijanak De
1mo
2
20
AI Safety via Debate
ESRogs
4y
13
13
Getting from an unaligned AGI to an aligned AGI?
Tor Økland Barstad
6mo
7
80
Air Conditioner Test Results & Discussion
johnswentworth
6mo
38
12
AI-assisted list of ten concrete alignment things to do right now
lcmgcd
3mo
5
120
Supervise Process, not Outcomes
stuhlmueller
8mo
8
105
Beliefs and Disagreements about Automating Alignment Research
Ian McKenzie
3mo
4
13
Briefly thinking through some analogs of debate
Eli Tyre
3mo
3
28
Making it harder for an AGI to "trick" us, with STVs
Tor Økland Barstad
5mo
5
44
Notes on OpenAI’s alignment plan
Alex Flint
12d
5
61
Relaxed adversarial training for inner alignment
evhub
3y
28
118
Debate update: Obfuscated arguments problem
Beth Barnes
1y
21
139
Paul's research agenda FAQ
zhukeepa
4y
73
13
Meta-execution
paulfchristiano
4y
1
37
HCH is not just Mechanical Turk
William_S
3y
6
24
The reward engineering problem
paulfchristiano
3y
3
37
Can HCH epistemically dominate Ramanujan?
zhukeepa
3y
4
8
Predicting HCH using expert advice
jessicata
6y
0
1
HCH as a measure of manipulation
orthonormal
5y
0
21
Reliability amplification
paulfchristiano
3y
3
14
Approval-directed bootstrapping
paulfchristiano
4y
0
30
Approval-directed agents
paulfchristiano
4y
11
12
Epistemology of HCH
adamShimi
1y
2