Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

76 posts Inner Alignment Outer Alignment Mesa-Optimization

78 posts Neuroscience Predictive Processing Neuromorphic AI Brain-Computer Interfaces Neocortex Neuralink Systems Thinking Emergent Behavior ( Emergence )

84 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

61 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

19 Value Formation: An Overarching Model

Thane Ruthenis

1mo

6

-1 Don't you think RLHF solves outer alignment?

Raphaël S

1mo

19

41 Mesa-Optimizers via Grokking

orthonormal

14d

4

81 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

28 Take 8: Queer the inner/outer alignment dichotomy.

Charlie Steiner

11d

2

-4 I there a demo of "You can't fetch the coffee if you're dead"?

Ram Rachum

1mo

9

60 How likely is deceptive alignment?

evhub

3mo

21

74 2-D Robustness

vlad_m

3y

8

165 Inner Alignment: Explain like I'm 12 Edition

Rafael Harth

2y

46

2 How much should we worry about mesa-optimization challenges?

sudo -i

4mo

13

22 Greed Is the Root of This Evil

Thane Ruthenis

2mo

4

7 Alignment as Game Design

Shoshannah Tekofsky

5mo

7

25 Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Jacy Reese Anthis

1mo

8

32 Predictive Processing, Heterosexuality and Delusions of Grandeur

lsusr

3d

2

72 My take on Jacob Cannell’s take on AGI safety

Steven Byrnes

22d

13

31 AI researchers announce NeuroAI agenda

Cameron Berg

1mo

12

29 [Hebbian Natural Abstractions] Introduction

Samuel Nellessen

29d

3

40 On oxytocin-sensitive neurons in auditory cortex

Steven Byrnes

3mo

6

-2 A physicist's approach to Origins of Life

pchvykov

5mo

6

34 Quick notes on “mirror neurons”

Steven Byrnes

2mo

2

41 [Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain

Steven Byrnes

10mo

12

13 (Link) I'm Missing a Chunk of My Brain

mukashi

3mo

2

134 Inner Alignment in Salt-Starved Rats

Steven Byrnes

2y

39

13 Brain-Brain communication

Jordan

11y

22

4 A future for neuroscience

Mike Johnson

4y

12

18 FAI and the Information Theory of Pleasure

johnsonmx

7y

19