Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

76 posts Inner Alignment Outer Alignment Mesa-Optimization

78 posts Neuroscience Predictive Processing Neuromorphic AI Brain-Computer Interfaces Neocortex Neuralink Systems Thinking Emergent Behavior ( Emergence )

108 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

59 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

21 Value Formation: An Overarching Model

Thane Ruthenis

1mo

6

5 Don't you think RLHF solves outer alignment?

Raphaël S

1mo

19

29 Mesa-Optimizers via Grokking

orthonormal

14d

4

93 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

24 Take 8: Queer the inner/outer alignment dichotomy.

Charlie Steiner

11d

2

20 I there a demo of "You can't fetch the coffee if you're dead"?

Ram Rachum

1mo

9

84 How likely is deceptive alignment?

evhub

3mo

21

80 2-D Robustness

vlad_m

3y

8

185 Inner Alignment: Explain like I'm 12 Edition

Rafael Harth

2y

46

6 How much should we worry about mesa-optimization challenges?

sudo -i

4mo

13

20 Greed Is the Root of This Evil

Thane Ruthenis

2mo

4

15 Alignment as Game Design

Shoshannah Tekofsky

5mo

7

33 Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Jacy Reese Anthis

1mo

8

28 Predictive Processing, Heterosexuality and Delusions of Grandeur

lsusr

3d

2

50 My take on Jacob Cannell’s take on AGI safety

Steven Byrnes

22d

13

43 AI researchers announce NeuroAI agenda

Cameron Berg

1mo

12

39 [Hebbian Natural Abstractions] Introduction

Samuel Nellessen

29d

3

22 On oxytocin-sensitive neurons in auditory cortex

Steven Byrnes

3mo

6

24 A physicist's approach to Origins of Life

pchvykov

5mo

6

28 Quick notes on “mirror neurons”

Steven Byrnes

2mo

2

45 [Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain

Steven Byrnes

10mo

12

13 (Link) I'm Missing a Chunk of My Brain

mukashi

3mo

2

138 Inner Alignment in Salt-Starved Rats

Steven Byrnes

2y

39

11 Brain-Brain communication

Jordan

11y

22

30 A future for neuroscience

Mike Johnson

4y

12

10 FAI and the Information Theory of Pleasure

johnsonmx

7y

19