Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
10 posts
Mesa-Optimization
12 posts
Outer Alignment
58
Agency As a Natural Abstraction
Thane Ruthenis
7mo
9
56
Meta learning to gradient hack
Quintin Pope
1y
11
148
Risks from Learned Optimization: Introduction
evhub
3y
42
24
[ASoT] Some thoughts about deceptive mesaoptimization
leogao
8mo
5
44
Thoughts on gradient hacking
Richard_Ngo
1y
12
74
Mesa-Search vs Mesa-Control
abramdemski
2y
45
85
Risks from Learned Optimization: Conclusion and Related Work
evhub
3y
4
38
Formal Solution to the Inner Alignment Problem
michaelcohen
1y
123
75
Conditions for Mesa-Optimization
evhub
3y
48
50
[AN #58] Mesa optimization: what it is, and why we should care
Rohin Shah
3y
9
64
Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)
LawrenceC
4d
10
71
Human Mimicry Mainly Works When We’re Already Close
johnswentworth
4mo
16
78
"Inner Alignment Failures" Which Are Actually Outer Alignment Failures
johnswentworth
2y
38
6
Inner alignment: what are we pointing at?
lcmgcd
3mo
2
73
An Increasingly Manipulative Newsfeed
Michaël Trazzi
3y
16
7
[ASoT] Some thoughts about imperfect world modeling
leogao
8mo
0
32
Outer alignment and imitative amplification
evhub
2y
11
46
The Steering Problem
paulfchristiano
4y
12
35
"Designing agent incentives to avoid reward tampering", DeepMind
gwern
3y
15
20
If I were a well-intentioned AI... II: Acting in a world
Stuart_Armstrong
2y
0
18
If I were a well-intentioned AI... III: Extremal Goodhart
Stuart_Armstrong
2y
0
-1
Planning capacity and daemons
lcmgcd
2mo
0