Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
71 posts
Outer Alignment
Optimization
Mesa-Optimization
Neuroscience
Neuromorphic AI
General Intelligence
Predictive Processing
AI Services (CAIS)
Selection vs Control
Neocortex
Distinctions
Computing Overhang
47 posts
Inner Alignment
Solomonoff Induction
Priors
Occam's Razor
56
Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)
LawrenceC
4d
10
47
My take on Jacob Cannell’s take on AGI safety
Steven Byrnes
22d
13
32
Don't align agents to evaluations of plans
TurnTrout
24d
46
111
What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?
johnswentworth
4mo
15
8
Take 6: CAIS is actually Orwellian.
Charlie Steiner
13d
5
45
Humans aren't fitness maximizers
So8res
2mo
45
65
Human Mimicry Mainly Works When We’re Already Close
johnswentworth
4mo
16
52
Agency As a Natural Abstraction
Thane Ruthenis
7mo
9
228
The ground of optimization
Alex Flint
2y
74
174
My computational framework for the brain
Steven Byrnes
2y
26
131
Inner Alignment in Salt-Starved Rats
Steven Byrnes
2y
39
66
Optimization Concepts in the Game of Life
Vika
1y
15
143
Matt Botvinick on the spontaneous emergence of learning algorithms
Adam Scholl
2y
87
41
Ngo and Yudkowsky on scientific reasoning and pivotal acts
Eliezer Yudkowsky
10mo
13
102
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout
18d
18
23
Take 8: Queer the inner/outer alignment dichotomy.
Charlie Steiner
11d
2
28
Mesa-Optimizers via Grokking
orthonormal
14d
4
65
Threat Model Literature Review
zac_kenton
1mo
4
127
Externalized reasoning oversight: a research direction for language model alignment
tamera
4mo
22
20
Value Formation: An Overarching Model
Thane Ruthenis
1mo
6
41
Framing AI Childhoods
David Udell
3mo
8
19
Greed Is the Root of This Evil
Thane Ruthenis
2mo
4
111
Theoretical Neuroscience For Alignment Theory
Cameron Berg
1y
19
42
Outer vs inner misalignment: three framings
Richard_Ngo
5mo
4
132
A Semitechnical Introductory Dialogue on Solomonoff Induction
Eliezer Yudkowsky
1y
34
175
Inner Alignment: Explain like I'm 12 Edition
Rafael Harth
2y
46
134
The Solomonoff Prior is Malign
Mark Xu
2y
52
25
Clarifying the confusion around inner alignment
Rauno Arike
7mo
0