Tree of Tags

Go Back

Choose this branch

You can't go any further

meritocratic regular democratic

hot top alive

10 posts Solomonoff Induction Priors Occam's Razor

37 posts Inner Alignment

162 The Solomonoff Prior is Malign

Mark Xu

2y

52

122 A Semitechnical Introductory Dialogue on Solomonoff Induction

Eliezer Yudkowsky

1y

34

94 Learning the prior

paulfchristiano

2y

29

61 Occam's Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann

Daniel Kokotajlo

3y

39

43 Learning the prior and generalization

evhub

2y

16

71 When does rationality-as-search have nontrivial implications?

nostalgebraist

4y

11

41 Instrumental Occam?

abramdemski

2y

15

22 Clarifying Consequentialists in the Solomonoff Prior

vlad_m

4y

16

16 The universal prior is malign

paulfchristiano

6y

0

1 Simplicity priors with reflective oracles

Benya_Fallenstein

8y

0

90 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

42 Mesa-Optimizers via Grokking

orthonormal

14d

4

29 Take 8: Queer the inner/outer alignment dichotomy.

Charlie Steiner

11d

2

45 Threat Model Literature Review

zac_kenton

1mo

4

20 Value Formation: An Overarching Model

Thane Ruthenis

1mo

6

79 Externalized reasoning oversight: a research direction for language model alignment

tamera

4mo

22

23 Greed Is the Root of This Evil

Thane Ruthenis

2mo

4

33 Framing AI Childhoods

David Udell

3mo

8

44 Outer vs inner misalignment: three framings

Richard_Ngo

5mo

4

175 Inner Alignment: Explain like I'm 12 Edition

Rafael Harth

2y

46

29 Clarifying the confusion around inner alignment

Rauno Arike

7mo

0

71 Empirical Observations of Objective Robustness Failures

jbkjr

1y

5

113 Demons in Imperfect Search

johnswentworth

2y

21

46 Applications for Deconfusing Goal-Directedness

adamShimi

1y

3