Go Back
Choose this branch
You can't go any further
meritocratic
regular
democratic
hot
top
alive
10 posts
Solomonoff Induction
Priors
Occam's Razor
37 posts
Inner Alignment
148
The Solomonoff Prior is Malign
Mark Xu
2y
52
127
A Semitechnical Introductory Dialogue on Solomonoff Induction
Eliezer Yudkowsky
1y
34
79
Learning the prior
paulfchristiano
2y
29
65
When does rationality-as-search have nontrivial implications?
nostalgebraist
4y
11
47
Occam's Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann
Daniel Kokotajlo
3y
39
34
Learning the prior and generalization
evhub
2y
16
30
Instrumental Occam?
abramdemski
2y
15
20
Clarifying Consequentialists in the Solomonoff Prior
vlad_m
4y
16
16
The universal prior is malign
paulfchristiano
6y
0
1
Simplicity priors with reflective oracles
Benya_Fallenstein
8y
0
175
Inner Alignment: Explain like I'm 12 Edition
Rafael Harth
2y
46
103
Externalized reasoning oversight: a research direction for language model alignment
tamera
4mo
22
103
Demons in Imperfect Search
johnswentworth
2y
21
99
The Inner Alignment Problem
evhub
3y
17
99
Gradient hacking
evhub
3y
39
96
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout
18d
18
87
Tessellating Hills: a toy model for demons in imperfect search
DaemonicSigil
2y
17
81
Open question: are minimal circuits daemon-free?
paulfchristiano
4y
70
77
2-D Robustness
vlad_m
3y
8
70
A simple environment for showing mesa misalignment
Matthew Barnett
3y
9
66
Are minimal circuits deceptive?
evhub
3y
11
63
Empirical Observations of Objective Robustness Failures
jbkjr
1y
5
63
Concrete experiments in inner alignment
evhub
3y
12
62
Theoretical Neuroscience For Alignment Theory
Cameron Berg
1y
19