Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
71 posts
Outer Alignment
Optimization
Mesa-Optimization
Neuroscience
Neuromorphic AI
General Intelligence
Predictive Processing
AI Services (CAIS)
Selection vs Control
Neocortex
Distinctions
Computing Overhang
47 posts
Inner Alignment
Solomonoff Induction
Priors
Occam's Razor
228
The ground of optimization
Alex Flint
2y
74
184
Risks from Learned Optimization: Introduction
evhub
3y
42
174
My computational framework for the brain
Steven Byrnes
2y
26
143
Matt Botvinick on the spontaneous emergence of learning algorithms
Adam Scholl
2y
87
141
Selection vs Control
abramdemski
3y
25
140
Reframing Superintelligence: Comprehensive AI Services as General Intelligence
Rohin Shah
3y
75
131
Inner Alignment in Salt-Starved Rats
Steven Byrnes
2y
39
111
What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?
johnswentworth
4mo
15
97
Book review: "A Thousand Brains" by Jeff Hawkins
Steven Byrnes
1y
18
94
Optimization Amplifies
Scott Garrabrant
4y
12
79
Inner alignment in the brain
Steven Byrnes
2y
16
79
How special are human brains among animal brains?
zhukeepa
2y
38
78
How uniform is the neocortex?
zhukeepa
2y
23
75
Comments on CAIS
Richard_Ngo
3y
14
175
Inner Alignment: Explain like I'm 12 Edition
Rafael Harth
2y
46
134
The Solomonoff Prior is Malign
Mark Xu
2y
52
132
A Semitechnical Introductory Dialogue on Solomonoff Induction
Eliezer Yudkowsky
1y
34
127
Externalized reasoning oversight: a research direction for language model alignment
tamera
4mo
22
111
Theoretical Neuroscience For Alignment Theory
Cameron Berg
1y
19
102
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout
18d
18
100
The Inner Alignment Problem
evhub
3y
17
95
Gradient hacking
evhub
3y
39
93
Demons in Imperfect Search
johnswentworth
2y
21
78
Tessellating Hills: a toy model for demons in imperfect search
DaemonicSigil
2y
17
77
2-D Robustness
vlad_m
3y
8
75
Open question: are minimal circuits daemon-free?
paulfchristiano
4y
70
72
A simple environment for showing mesa misalignment
Matthew Barnett
3y
9
65
Threat Model Literature Review
zac_kenton
1mo
4