Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

76 posts Inner Alignment Outer Alignment Mesa-Optimization

78 posts Neuroscience Predictive Processing Neuromorphic AI Brain-Computer Interfaces Neocortex Neuralink Systems Thinking Emergent Behavior ( Emergence )

96 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

60 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

20 Value Formation: An Overarching Model

Thane Ruthenis

1mo

6

2 Don't you think RLHF solves outer alignment?

Raphaël S

1mo

19

35 Mesa-Optimizers via Grokking

orthonormal

14d

4

87 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

26 Take 8: Queer the inner/outer alignment dichotomy.

Charlie Steiner

11d

2

8 I there a demo of "You can't fetch the coffee if you're dead"?

Ram Rachum

1mo

9

72 How likely is deceptive alignment?

evhub

3mo

21

77 2-D Robustness

vlad_m

3y

8

175 Inner Alignment: Explain like I'm 12 Edition

Rafael Harth

2y

46

4 How much should we worry about mesa-optimization challenges?

sudo -i

4mo

13

21 Greed Is the Root of This Evil

Thane Ruthenis

2mo

4

11 Alignment as Game Design

Shoshannah Tekofsky

5mo

7

29 Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Jacy Reese Anthis

1mo

8

30 Predictive Processing, Heterosexuality and Delusions of Grandeur

lsusr

3d

2

61 My take on Jacob Cannell’s take on AGI safety

Steven Byrnes

22d

13

37 AI researchers announce NeuroAI agenda

Cameron Berg

1mo

12

34 [Hebbian Natural Abstractions] Introduction

Samuel Nellessen

29d

3

31 On oxytocin-sensitive neurons in auditory cortex

Steven Byrnes

3mo

6

11 A physicist's approach to Origins of Life

pchvykov

5mo

6

31 Quick notes on “mirror neurons”

Steven Byrnes

2mo

2

43 [Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain

Steven Byrnes

10mo

12

13 (Link) I'm Missing a Chunk of My Brain

mukashi

3mo

2

136 Inner Alignment in Salt-Starved Rats

Steven Byrnes

2y

39

12 Brain-Brain communication

Jordan

11y

22

17 A future for neuroscience

Mike Johnson

4y

12

14 FAI and the Information Theory of Pleasure

johnsonmx

7y

19