Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

154 posts Inner Alignment Neuroscience Outer Alignment Mesa-Optimization Predictive Processing Neuromorphic AI Brain-Computer Interfaces Neocortex Neuralink Systems Thinking Emergent Behavior ( Emergence )

148 posts Goodhart's Law Optimization General Intelligence AI Services (CAIS) Adaptation Executors Superstimuli Narrow AI Hope Selection vs Control Delegation

108 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

59 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

21 Value Formation: An Overarching Model

Thane Ruthenis

1mo

6

33 Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Jacy Reese Anthis

1mo

8

28 Predictive Processing, Heterosexuality and Delusions of Grandeur

lsusr

3d

2

50 My take on Jacob Cannell’s take on AGI safety

Steven Byrnes

22d

13

5 Don't you think RLHF solves outer alignment?

Raphaël S

1mo

19

29 Mesa-Optimizers via Grokking

orthonormal

14d

4

93 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

24 Take 8: Queer the inner/outer alignment dichotomy.

Charlie Steiner

11d

2

20 I there a demo of "You can't fetch the coffee if you're dead"?

Ram Rachum

1mo

9

84 How likely is deceptive alignment?

evhub

3mo

21

80 2-D Robustness

vlad_m

3y

8

43 AI researchers announce NeuroAI agenda

Cameron Berg

1mo

12

33 Don't align agents to evaluations of plans

TurnTrout

24d

46

72 Don't design agents which exploit adversarial inputs

TurnTrout

1mo

61

55 Alignment allows "nonrobust" decision-influences and doesn't require robust grading

TurnTrout

21d

27

47 Humans aren't fitness maximizers

So8res

2mo

45

8 Take 6: CAIS is actually Orwellian.

Charlie Steiner

13d

5

27 The reward function is already how well you manipulate humans

Kerry

2mo

9

117 What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?

johnswentworth

4mo

15

79 Measuring Optimization Power

Eliezer Yudkowsky

14y

35

75 "Normal" is the equilibrium state of past optimization processes

Alex_Altair

1mo

5

53 Vingean Agency

abramdemski

3mo

13

15 Are Intelligence and Generality Orthogonal?

cubefox

5mo

16

23 Is General Intelligence "Compact"?

DragonGod

5mo

6

207 Utility Maximization = Description Length Minimization

johnswentworth

1y

40

48 I No Longer Believe Intelligence to be "Magical"

DragonGod

6mo

34