Go Back
Choose this branch
You can't go any further
meritocratic
regular
democratic
hot
top
alive
46 posts
Factored Cognition
Experiments
Ought
AI-assisted Alignment
Memory and Mnemonics
Air Conditioning
23 posts
Debate (AI safety technique)
118
Godzilla Strategies
johnswentworth
6mo
65
3
Alignment with argument-networks and assessment-predictions
Tor Økland Barstad
7d
3
11
Research request (alignment strategy): Deep dive on "making AI solve alignment for us"
JanBrauner
19d
3
84
Rant on Problem Factorization for Alignment
johnswentworth
4mo
48
3
Provably Honest - A First Step
Srijanak De
1mo
2
5
Getting from an unaligned AGI to an aligned AGI?
Tor Økland Barstad
6mo
7
80
Air Conditioner Test Results & Discussion
johnswentworth
6mo
38
4
AI-assisted list of ten concrete alignment things to do right now
lcmgcd
3mo
5
116
Supervise Process, not Outcomes
stuhlmueller
8mo
8
79
Beliefs and Disagreements about Automating Alignment Research
Ian McKenzie
3mo
4
0
Making it harder for an AGI to "trick" us, with STVs
Tor Økland Barstad
5mo
5
1
A Deceptively Simple Argument in favor of Problem Factorization
Logan Zoellner
4mo
4
11
Discussion on utilizing AI for alignment
elifland
3mo
3
5
Sufficiently many Godzillas as an alignment strategy
142857
3mo
3
46
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Charlie Steiner
8d
14
34
AI Safety via Debate
ESRogs
4y
13
27
Briefly thinking through some analogs of debate
Eli Tyre
3mo
3
32
A Small Negative Result on Debate
Sam Bowman
8mo
11
91
Writeup: Progress on AI Safety via Debate
Beth Barnes
2y
18
62
A guide to Iterated Amplification & Debate
Rafael Harth
2y
10
26
AI Safety Debate and Its Applications
VojtaKovarik
3y
5
12
Debate AI and the Decision to Release an AI
Chris_Leong
3y
18
9
Splitting Debate up into Two Subsystems
Nandi
2y
5
30
Learning the smooth prior
Geoffrey Irving
7mo
0
102
Imitative Generalisation (AKA 'Learning the Prior')
Beth Barnes
1y
14
77
Why I'm excited about Debate
Richard_Ngo
1y
12
34
New paper: (When) is Truth-telling Favored in AI debate?
VojtaKovarik
2y
7
16
Thoughts on "AI safety via debate"
Gordon Seidoh Worley
4y
4