Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

71 posts Outer Alignment Optimization Mesa-Optimization Neuroscience Neuromorphic AI General Intelligence Predictive Processing AI Services (CAIS) Selection vs Control Neocortex Distinctions Computing Overhang

47 posts Inner Alignment Solomonoff Induction Priors Occam's Razor

64 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

75 My take on Jacob Cannell’s take on AGI safety

Steven Byrnes

22d

13

42 Don't align agents to evaluations of plans

TurnTrout

24d

46

20 Take 6: CAIS is actually Orwellian.

Charlie Steiner

13d

5

59 Humans aren't fitness maximizers

So8res

2mo

45

95 What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?

johnswentworth

4mo

15

71 Human Mimicry Mainly Works When We’re Already Close

johnswentworth

4mo

16

58 Agency As a Natural Abstraction

Thane Ruthenis

7mo

9

61 Ngo and Yudkowsky on scientific reasoning and pivotal acts

Eliezer Yudkowsky

10mo

13

206 The ground of optimization

Alex Flint

2y

74

123 Book review: "A Thousand Brains" by Jeff Hawkins

Steven Byrnes

1y

18

46 [Intro to brain-like-AGI safety] 8. Takeaways from neuro 1/2: On AGI development

Steven Byrnes

9mo

2

77 Brain-inspired AGI and the "lifetime anchor"

Steven Byrnes

1y

16

141 Inner Alignment in Salt-Starved Rats

Steven Byrnes

2y

39

90 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

42 Mesa-Optimizers via Grokking

orthonormal

14d

4

29 Take 8: Queer the inner/outer alignment dichotomy.

Charlie Steiner

11d

2

45 Threat Model Literature Review

zac_kenton

1mo

4

20 Value Formation: An Overarching Model

Thane Ruthenis

1mo

6

79 Externalized reasoning oversight: a research direction for language model alignment

tamera

4mo

22

23 Greed Is the Root of This Evil

Thane Ruthenis

2mo

4

33 Framing AI Childhoods

David Udell

3mo

8

44 Outer vs inner misalignment: three framings

Richard_Ngo

5mo

4

162 The Solomonoff Prior is Malign

Mark Xu

2y

52

175 Inner Alignment: Explain like I'm 12 Edition

Rafael Harth

2y

46

122 A Semitechnical Introductory Dialogue on Solomonoff Induction

Eliezer Yudkowsky

1y

34

29 Clarifying the confusion around inner alignment

Rauno Arike

7mo

0

71 Empirical Observations of Objective Robustness Failures

jbkjr

1y

5