Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

154 posts Inner Alignment Neuroscience Outer Alignment Mesa-Optimization Predictive Processing Neuromorphic AI Brain-Computer Interfaces Neocortex Neuralink Systems Thinking Emergent Behavior ( Emergence )

148 posts Goodhart's Law Optimization General Intelligence AI Services (CAIS) Adaptation Executors Superstimuli Narrow AI Hope Selection vs Control Delegation

59 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

28 Predictive Processing, Heterosexuality and Delusions of Grandeur

lsusr

3d

2

108 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

24 Take 8: Queer the inner/outer alignment dichotomy.

Charlie Steiner

11d

2

50 My take on Jacob Cannell’s take on AGI safety

Steven Byrnes

22d

13

29 Mesa-Optimizers via Grokking

orthonormal

14d

4

93 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

39 [Hebbian Natural Abstractions] Introduction

Samuel Nellessen

29d

3

33 Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Jacy Reese Anthis

1mo

8

28 The Disastrously Confident And Inaccurate AI

Sharat Jacob Jacob

1mo

0

43 AI researchers announce NeuroAI agenda

Cameron Berg

1mo

12

84 How likely is deceptive alignment?

evhub

3mo

21

21 Value Formation: An Overarching Model

Thane Ruthenis

1mo

6

20 I there a demo of "You can't fetch the coffee if you're dead"?

Ram Rachum

1mo

9

55 Alignment allows "nonrobust" decision-influences and doesn't require robust grading

TurnTrout

21d

27

72 Don't design agents which exploit adversarial inputs

TurnTrout

1mo

61

33 Don't align agents to evaluations of plans

TurnTrout

24d

46

75 "Normal" is the equilibrium state of past optimization processes

Alex_Altair

1mo

5

31 The economy as an analogy for advanced AI systems

rosehadshar

1mo

0

117 What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?

johnswentworth

4mo

15

8 Take 6: CAIS is actually Orwellian.

Charlie Steiner

13d

5

47 Humans aren't fitness maximizers

So8res

2mo

45

27 The reward function is already how well you manipulate humans

Kerry

2mo

9

53 Vingean Agency

abramdemski

3mo

13

57 [Yann Lecun] A Path Towards Autonomous Machine Intelligence

DragonGod

5mo

12

26 program searches

carado

3mo

2

207 Utility Maximization = Description Length Minimization

johnswentworth

1y

40

48 I No Longer Believe Intelligence to be "Magical"

DragonGod

6mo

34