Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
71 posts
Outer Alignment
Optimization
Mesa-Optimization
Neuroscience
Neuromorphic AI
General Intelligence
Predictive Processing
AI Services (CAIS)
Selection vs Control
Neocortex
Distinctions
Computing Overhang
47 posts
Inner Alignment
Solomonoff Induction
Priors
Occam's Razor
60
Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)
LawrenceC
4d
10
61
My take on Jacob Cannell’s take on AGI safety
Steven Byrnes
22d
13
37
Don't align agents to evaluations of plans
TurnTrout
24d
46
14
Take 6: CAIS is actually Orwellian.
Charlie Steiner
13d
5
103
What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?
johnswentworth
4mo
15
52
Humans aren't fitness maximizers
So8res
2mo
45
68
Human Mimicry Mainly Works When We’re Already Close
johnswentworth
4mo
16
55
Agency As a Natural Abstraction
Thane Ruthenis
7mo
9
217
The ground of optimization
Alex Flint
2y
74
51
Ngo and Yudkowsky on scientific reasoning and pivotal acts
Eliezer Yudkowsky
10mo
13
136
Inner Alignment in Salt-Starved Rats
Steven Byrnes
2y
39
68
Optimization Concepts in the Game of Life
Vika
1y
15
144
My computational framework for the brain
Steven Byrnes
2y
26
110
Book review: "A Thousand Brains" by Jeff Hawkins
Steven Byrnes
1y
18
96
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout
18d
18
35
Mesa-Optimizers via Grokking
orthonormal
14d
4
26
Take 8: Queer the inner/outer alignment dichotomy.
Charlie Steiner
11d
2
55
Threat Model Literature Review
zac_kenton
1mo
4
103
Externalized reasoning oversight: a research direction for language model alignment
tamera
4mo
22
20
Value Formation: An Overarching Model
Thane Ruthenis
1mo
6
37
Framing AI Childhoods
David Udell
3mo
8
21
Greed Is the Root of This Evil
Thane Ruthenis
2mo
4
43
Outer vs inner misalignment: three framings
Richard_Ngo
5mo
4
127
A Semitechnical Introductory Dialogue on Solomonoff Induction
Eliezer Yudkowsky
1y
34
175
Inner Alignment: Explain like I'm 12 Edition
Rafael Harth
2y
46
148
The Solomonoff Prior is Malign
Mark Xu
2y
52
62
Theoretical Neuroscience For Alignment Theory
Cameron Berg
1y
19
27
Clarifying the confusion around inner alignment
Rauno Arike
7mo
0