Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
12 posts
Mesa-Optimization
18 posts
Outer Alignment
81
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
2
How much should we worry about mesa-optimization challenges?
sudo -i
4mo
13
-3
Do mesa-optimization problems correlate with low-slack?
sudo -i
10mo
1
58
Weak arguments against the universal prior being malign
X4vier
4y
23
137
Risks from Learned Optimization: Introduction
evhub
3y
42
32
The Speed + Simplicity Prior is probably anti-deceptive
7mo
29
22
Three questions about mesa-optimizers
Eric Neyman
8mo
5
71
Prize for probable problems
paulfchristiano
4y
63
47
[AN #58] Mesa optimization: what it is, and why we should care
Rohin Shah
3y
9
0
Is evolutionary influence the mesa objective that we're interested in?
David Johnston
7mo
2
23
[ASoT] Some thoughts about deceptive mesaoptimization
leogao
8mo
5
10
Mesa-utility functions might not be purely proxy goals
Thomas Kwa
8mo
17
61
Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)
LawrenceC
4d
10
-1
Don't you think RLHF solves outer alignment?
Raphaël S
1mo
19
7
Alignment as Game Design
Shoshannah Tekofsky
5mo
7
6
Inner alignment: what are we pointing at?
lcmgcd
3mo
2
38
Mesa-Optimizers vs “Steered Optimizers”
Steven Byrnes
2y
7
31
Outer alignment and imitative amplification
evhub
2y
11
44
The Steering Problem
paulfchristiano
4y
12
19
Is the Star Trek Federation really incapable of building AI?
Kaj_Sotala
4y
4
-2
Alignment via manually implementing the utility function
Chantiel
1y
6
71
An Increasingly Manipulative Newsfeed
Michaël Trazzi
3y
16
116
List of resolved confusions about IDA
Wei_Dai
3y
18
76
"Inner Alignment Failures" Which Are Actually Outer Alignment Failures
johnswentworth
2y
38
-2
The Disastrously Confident And Inaccurate AI
Sharat Jacob Jacob
1mo
0
19
If I were a well-intentioned AI... II: Acting in a world
Stuart_Armstrong
2y
0