Go Back
Choose this branch
You can't go any further
meritocratic
regular
democratic
hot
top
alive
30 posts
Outer Alignment
Mesa-Optimization
46 posts
Inner Alignment
59
Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)
LawrenceC
4d
10
5
Don't you think RLHF solves outer alignment?
Raphaël S
1mo
19
93
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
6
How much should we worry about mesa-optimization challenges?
sudo -i
4mo
13
15
Alignment as Game Design
Shoshannah Tekofsky
5mo
7
5
Do mesa-optimization problems correlate with low-slack?
sudo -i
10mo
1
8
Inner alignment: what are we pointing at?
lcmgcd
3mo
2
52
Mesa-Optimizers vs “Steered Optimizers”
Steven Byrnes
2y
7
17
Outer alignment and imitative amplification
evhub
2y
11
42
The Steering Problem
paulfchristiano
4y
12
19
Is the Star Trek Federation really incapable of building AI?
Kaj_Sotala
4y
4
4
Alignment via manually implementing the utility function
Chantiel
1y
6
42
Weak arguments against the universal prior being malign
X4vier
4y
23
53
An Increasingly Manipulative Newsfeed
Michaël Trazzi
3y
16
108
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout
18d
18
21
Value Formation: An Overarching Model
Thane Ruthenis
1mo
6
29
Mesa-Optimizers via Grokking
orthonormal
14d
4
24
Take 8: Queer the inner/outer alignment dichotomy.
Charlie Steiner
11d
2
20
I there a demo of "You can't fetch the coffee if you're dead"?
Ram Rachum
1mo
9
84
How likely is deceptive alignment?
evhub
3mo
21
80
2-D Robustness
vlad_m
3y
8
185
Inner Alignment: Explain like I'm 12 Edition
Rafael Harth
2y
46
20
Greed Is the Root of This Evil
Thane Ruthenis
2mo
4
36
Broad Picture of Human Values
Thane Ruthenis
4mo
5
8
Doom doubts - is inner alignment a likely problem?
Crissman
5mo
7
44
Outer vs inner misalignment: three framings
Richard_Ngo
5mo
4
63
Discussion: Objective Robustness and Inner Alignment Terminology
jbkjr
1y
7
112
Selection Theorems: A Program For Understanding Agents
johnswentworth
1y
23