Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
10 posts
Mesa-Optimization
12 posts
Outer Alignment
55
Agency As a Natural Abstraction
Thane Ruthenis
7mo
9
54
Meta learning to gradient hack
Quintin Pope
1y
11
166
Risks from Learned Optimization: Introduction
evhub
3y
42
24
[ASoT] Some thoughts about deceptive mesaoptimization
leogao
8mo
5
33
Thoughts on gradient hacking
Richard_Ngo
1y
12
47
Formal Solution to the Inner Alignment Problem
michaelcohen
1y
123
54
Mesa-Search vs Mesa-Control
abramdemski
2y
45
78
Risks from Learned Optimization: Conclusion and Related Work
evhub
3y
4
75
Conditions for Mesa-Optimization
evhub
3y
48
54
[AN #58] Mesa optimization: what it is, and why we should care
Rohin Shah
3y
9
60
Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)
LawrenceC
4d
10
68
Human Mimicry Mainly Works When We’re Already Close
johnswentworth
4mo
16
7
Inner alignment: what are we pointing at?
lcmgcd
3mo
2
61
"Inner Alignment Failures" Which Are Actually Outer Alignment Failures
johnswentworth
2y
38
62
An Increasingly Manipulative Newsfeed
Michaël Trazzi
3y
16
2
Planning capacity and daemons
lcmgcd
2mo
0
7
[ASoT] Some thoughts about imperfect world modeling
leogao
8mo
0
43
The Steering Problem
paulfchristiano
4y
12
28
"Designing agent incentives to avoid reward tampering", DeepMind
gwern
3y
15
24
Outer alignment and imitative amplification
evhub
2y
11
22
If I were a well-intentioned AI... III: Extremal Goodhart
Stuart_Armstrong
2y
0
20
If I were a well-intentioned AI... II: Acting in a world
Stuart_Armstrong
2y
0