Tree of Tags

Go Back

Choose this branch

You can't go any further

meritocratic regular democratic

hot top alive

10 posts Solomonoff Induction Priors Occam's Razor

37 posts Inner Alignment

148 The Solomonoff Prior is Malign

Mark Xu

2y

52

127 A Semitechnical Introductory Dialogue on Solomonoff Induction

Eliezer Yudkowsky

1y

34

79 Learning the prior

paulfchristiano

2y

29

65 When does rationality-as-search have nontrivial implications?

nostalgebraist

4y

11

47 Occam's Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann

Daniel Kokotajlo

3y

39

34 Learning the prior and generalization

evhub

2y

16

30 Instrumental Occam?

abramdemski

2y

15

20 Clarifying Consequentialists in the Solomonoff Prior

vlad_m

4y

16

16 The universal prior is malign

paulfchristiano

6y

0

1 Simplicity priors with reflective oracles

Benya_Fallenstein

8y

0

175 Inner Alignment: Explain like I'm 12 Edition

Rafael Harth

2y

46

103 Externalized reasoning oversight: a research direction for language model alignment

tamera

4mo

22

103 Demons in Imperfect Search

johnswentworth

2y

21

99 The Inner Alignment Problem

evhub

3y

17

99 Gradient hacking

evhub

3y

39

96 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

87 Tessellating Hills: a toy model for demons in imperfect search

DaemonicSigil

2y

17

81 Open question: are minimal circuits daemon-free?

paulfchristiano

4y

70

77 2-D Robustness

vlad_m

3y

8

70 A simple environment for showing mesa misalignment

Matthew Barnett

3y

9

66 Are minimal circuits deceptive?

evhub

3y

11

63 Empirical Observations of Objective Robustness Failures

jbkjr

1y

5

63 Concrete experiments in inner alignment

evhub

3y

12

62 Theoretical Neuroscience For Alignment Theory

Cameron Berg

1y

19