Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

154 posts Inner Alignment Neuroscience Outer Alignment Mesa-Optimization Predictive Processing Neuromorphic AI Brain-Computer Interfaces Neocortex Neuralink Systems Thinking Emergent Behavior ( Emergence )

148 posts Goodhart's Law Optimization General Intelligence AI Services (CAIS) Adaptation Executors Superstimuli Narrow AI Hope Selection vs Control Delegation

61 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

32 Predictive Processing, Heterosexuality and Delusions of Grandeur

lsusr

3d

2

84 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

72 My take on Jacob Cannell’s take on AGI safety

Steven Byrnes

22d

13

41 Mesa-Optimizers via Grokking

orthonormal

14d

4

28 Take 8: Queer the inner/outer alignment dichotomy.

Charlie Steiner

11d

2

81 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

29 [Hebbian Natural Abstractions] Introduction

Samuel Nellessen

29d

3

25 Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Jacy Reese Anthis

1mo

8

19 Value Formation: An Overarching Model

Thane Ruthenis

1mo

6

31 AI researchers announce NeuroAI agenda

Cameron Berg

1mo

12

60 How likely is deceptive alignment?

evhub

3mo

21

34 Quick notes on “mirror neurons”

Steven Byrnes

2mo

2

40 On oxytocin-sensitive neurons in auditory cortex

Steven Byrnes

3mo

6

55 Alignment allows "nonrobust" decision-influences and doesn't require robust grading

TurnTrout

21d

27

41 Don't align agents to evaluations of plans

TurnTrout

24d

46

20 Take 6: CAIS is actually Orwellian.

Charlie Steiner

13d

5

48 Don't design agents which exploit adversarial inputs

TurnTrout

1mo

61

79 "Normal" is the equilibrium state of past optimization processes

Alex_Altair

1mo

5

57 Humans aren't fitness maximizers

So8res

2mo

45

21 The economy as an analogy for advanced AI systems

rosehadshar

1mo

0

89 What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?

johnswentworth

4mo

15

61 Vingean Agency

abramdemski

3mo

13

13 The reward function is already how well you manipulate humans

Kerry

2mo

9

159 Utility Maximization = Description Length Minimization

johnswentworth

1y

40

60 Ngo and Yudkowsky on scientific reasoning and pivotal acts

Eliezer Yudkowsky

10mo

13

9 When trying to define general intelligence is ability to achieve goals the best metric?

jmh

1mo

0

193 The ground of optimization

Alex Flint

2y

74