Go Back
Choose this branch
You can't go any further
meritocratic
regular
democratic
hot
top
alive
10 posts
Solomonoff Induction
Priors
Occam's Razor
37 posts
Inner Alignment
162
The Solomonoff Prior is Malign
Mark Xu
2y
52
122
A Semitechnical Introductory Dialogue on Solomonoff Induction
Eliezer Yudkowsky
1y
34
94
Learning the prior
paulfchristiano
2y
29
61
Occam's Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann
Daniel Kokotajlo
3y
39
43
Learning the prior and generalization
evhub
2y
16
71
When does rationality-as-search have nontrivial implications?
nostalgebraist
4y
11
41
Instrumental Occam?
abramdemski
2y
15
22
Clarifying Consequentialists in the Solomonoff Prior
vlad_m
4y
16
16
The universal prior is malign
paulfchristiano
6y
0
1
Simplicity priors with reflective oracles
Benya_Fallenstein
8y
0
90
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout
18d
18
42
Mesa-Optimizers via Grokking
orthonormal
14d
4
29
Take 8: Queer the inner/outer alignment dichotomy.
Charlie Steiner
11d
2
45
Threat Model Literature Review
zac_kenton
1mo
4
20
Value Formation: An Overarching Model
Thane Ruthenis
1mo
6
79
Externalized reasoning oversight: a research direction for language model alignment
tamera
4mo
22
23
Greed Is the Root of This Evil
Thane Ruthenis
2mo
4
33
Framing AI Childhoods
David Udell
3mo
8
44
Outer vs inner misalignment: three framings
Richard_Ngo
5mo
4
175
Inner Alignment: Explain like I'm 12 Edition
Rafael Harth
2y
46
29
Clarifying the confusion around inner alignment
Rauno Arike
7mo
0
71
Empirical Observations of Objective Robustness Failures
jbkjr
1y
5
113
Demons in Imperfect Search
johnswentworth
2y
21
46
Applications for Deconfusing Goal-Directedness
adamShimi
1y
3