Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

154 posts Inner Alignment Neuroscience Outer Alignment Mesa-Optimization Predictive Processing Neuromorphic AI Brain-Computer Interfaces Neocortex Neuralink Systems Thinking Emergent Behavior ( Emergence )

148 posts Goodhart's Law Optimization General Intelligence AI Services (CAIS) Adaptation Executors Superstimuli Narrow AI Hope Selection vs Control Delegation

60 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

30 Predictive Processing, Heterosexuality and Delusions of Grandeur

lsusr

3d

2

96 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

61 My take on Jacob Cannell’s take on AGI safety

Steven Byrnes

22d

13

35 Mesa-Optimizers via Grokking

orthonormal

14d

4

26 Take 8: Queer the inner/outer alignment dichotomy.

Charlie Steiner

11d

2

87 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

34 [Hebbian Natural Abstractions] Introduction

Samuel Nellessen

29d

3

29 Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Jacy Reese Anthis

1mo

8

37 AI researchers announce NeuroAI agenda

Cameron Berg

1mo

12

20 Value Formation: An Overarching Model

Thane Ruthenis

1mo

6

72 How likely is deceptive alignment?

evhub

3mo

21

13 The Disastrously Confident And Inaccurate AI

Sharat Jacob Jacob

1mo

0

31 Quick notes on “mirror neurons”

Steven Byrnes

2mo

2

55 Alignment allows "nonrobust" decision-influences and doesn't require robust grading

TurnTrout

21d

27

60 Don't design agents which exploit adversarial inputs

TurnTrout

1mo

61

37 Don't align agents to evaluations of plans

TurnTrout

24d

46

77 "Normal" is the equilibrium state of past optimization processes

Alex_Altair

1mo

5

14 Take 6: CAIS is actually Orwellian.

Charlie Steiner

13d

5

26 The economy as an analogy for advanced AI systems

rosehadshar

1mo

0

103 What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?

johnswentworth

4mo

15

52 Humans aren't fitness maximizers

So8res

2mo

45

57 Vingean Agency

abramdemski

3mo

13

20 The reward function is already how well you manipulate humans

Kerry

2mo

9

183 Utility Maximization = Description Length Minimization

johnswentworth

1y

40

38 [Yann Lecun] A Path Towards Autonomous Machine Intelligence

DragonGod

5mo

12

21 program searches

carado

3mo

2

217 The ground of optimization

Alex Flint

2y

74