Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
69 posts
Debate (AI safety technique)
Factored Cognition
Experiments
Ought
AI-assisted Alignment
Memory and Mnemonics
Air Conditioning
43 posts
Iterated Amplification
Humans Consulting HCH
26
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Charlie Steiner
8d
14
11
Alignment with argument-networks and assessment-predictions
Tor Økland Barstad
7d
3
21
Research request (alignment strategy): Deep dive on "making AI solve alignment for us"
JanBrauner
19d
3
184
Godzilla Strategies
johnswentworth
6mo
65
105
Beliefs and Disagreements about Automating Alignment Research
Ian McKenzie
3mo
4
66
A Library and Tutorial for Factored Cognition with Language Models
stuhlmueller
2mo
0
52
Ought will host a factored cognition “Lab Meeting”
jungofthewon
3mo
1
62
Rant on Problem Factorization for Alignment
johnswentworth
4mo
48
17
Provably Honest - A First Step
Srijanak De
1mo
2
80
Air Conditioner Test Results & Discussion
johnswentworth
6mo
38
112
Preregistration: Air Conditioner Test
johnswentworth
8mo
64
120
Supervise Process, not Outcomes
stuhlmueller
8mo
8
52
A Small Negative Result on Debate
Sam Bowman
8mo
11
21
Discussion on utilizing AI for alignment
elifland
3mo
3
44
Notes on OpenAI’s alignment plan
Alex Flint
12d
5
25
Surprised by ELK report's counterexample to Debate, IDA
Evan R. Murphy
4mo
0
118
Debate update: Obfuscated arguments problem
Beth Barnes
1y
21
121
My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda
Chi Nguyen
2y
21
46
Garrabrant and Shah on human modeling in AGI
Rob Bensinger
1y
10
139
Paul's research agenda FAQ
zhukeepa
4y
73
63
Model splintering: moving from one imperfect model to another
Stuart_Armstrong
2y
10
18
HCH and Adversarial Questions
David Udell
10mo
7
36
HCH Speculation Post #2A
Charlie Steiner
1y
7
17
Universality and the “Filter”
maggiehayes
1y
3
61
Relaxed adversarial training for inner alignment
evhub
3y
28
34
Relating HCH and Logical Induction
abramdemski
2y
4
46
Iterated Distillation and Amplification
Ajeya Cotra
4y
13
24
Universality Unwrapped
adamShimi
2y
2