Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

154 posts Inner Alignment Neuroscience Outer Alignment Mesa-Optimization Predictive Processing Neuromorphic AI Brain-Computer Interfaces Neocortex Neuralink Systems Thinking Emergent Behavior ( Emergence )

148 posts Goodhart's Law Optimization General Intelligence AI Services (CAIS) Adaptation Executors Superstimuli Narrow AI Hope Selection vs Control Delegation

84 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

61 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

19 Value Formation: An Overarching Model

Thane Ruthenis

1mo

6

25 Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Jacy Reese Anthis

1mo

8

32 Predictive Processing, Heterosexuality and Delusions of Grandeur

lsusr

3d

2

72 My take on Jacob Cannell’s take on AGI safety

Steven Byrnes

22d

13

-1 Don't you think RLHF solves outer alignment?

Raphaël S

1mo

19

41 Mesa-Optimizers via Grokking

orthonormal

14d

4

81 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

28 Take 8: Queer the inner/outer alignment dichotomy.

Charlie Steiner

11d

2

-4 I there a demo of "You can't fetch the coffee if you're dead"?

Ram Rachum

1mo

9

60 How likely is deceptive alignment?

evhub

3mo

21

74 2-D Robustness

vlad_m

3y

8

31 AI researchers announce NeuroAI agenda

Cameron Berg

1mo

12

41 Don't align agents to evaluations of plans

TurnTrout

24d

46

48 Don't design agents which exploit adversarial inputs

TurnTrout

1mo

61

55 Alignment allows "nonrobust" decision-influences and doesn't require robust grading

TurnTrout

21d

27

57 Humans aren't fitness maximizers

So8res

2mo

45

20 Take 6: CAIS is actually Orwellian.

Charlie Steiner

13d

5

13 The reward function is already how well you manipulate humans

Kerry

2mo

9

89 What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?

johnswentworth

4mo

15

51 Measuring Optimization Power

Eliezer Yudkowsky

14y

35

79 "Normal" is the equilibrium state of past optimization processes

Alex_Altair

1mo

5

61 Vingean Agency

abramdemski

3mo

13

19 Are Intelligence and Generality Orthogonal?

cubefox

5mo

16

19 Is General Intelligence "Compact"?

DragonGod

5mo

6

159 Utility Maximization = Description Length Minimization

johnswentworth

1y

40

14 I No Longer Believe Intelligence to be "Magical"

DragonGod

6mo

34