Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
71 posts
Outer Alignment
Optimization
Mesa-Optimization
Neuroscience
Neuromorphic AI
General Intelligence
Predictive Processing
AI Services (CAIS)
Selection vs Control
Neocortex
Distinctions
Computing Overhang
47 posts
Inner Alignment
Solomonoff Induction
Priors
Occam's Razor
64
Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)
LawrenceC
4d
10
75
My take on Jacob Cannell’s take on AGI safety
Steven Byrnes
22d
13
42
Don't align agents to evaluations of plans
TurnTrout
24d
46
20
Take 6: CAIS is actually Orwellian.
Charlie Steiner
13d
5
59
Humans aren't fitness maximizers
So8res
2mo
45
95
What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?
johnswentworth
4mo
15
71
Human Mimicry Mainly Works When We’re Already Close
johnswentworth
4mo
16
58
Agency As a Natural Abstraction
Thane Ruthenis
7mo
9
61
Ngo and Yudkowsky on scientific reasoning and pivotal acts
Eliezer Yudkowsky
10mo
13
206
The ground of optimization
Alex Flint
2y
74
123
Book review: "A Thousand Brains" by Jeff Hawkins
Steven Byrnes
1y
18
46
[Intro to brain-like-AGI safety] 8. Takeaways from neuro 1/2: On AGI development
Steven Byrnes
9mo
2
77
Brain-inspired AGI and the "lifetime anchor"
Steven Byrnes
1y
16
141
Inner Alignment in Salt-Starved Rats
Steven Byrnes
2y
39
90
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout
18d
18
42
Mesa-Optimizers via Grokking
orthonormal
14d
4
29
Take 8: Queer the inner/outer alignment dichotomy.
Charlie Steiner
11d
2
45
Threat Model Literature Review
zac_kenton
1mo
4
20
Value Formation: An Overarching Model
Thane Ruthenis
1mo
6
79
Externalized reasoning oversight: a research direction for language model alignment
tamera
4mo
22
23
Greed Is the Root of This Evil
Thane Ruthenis
2mo
4
33
Framing AI Childhoods
David Udell
3mo
8
44
Outer vs inner misalignment: three framings
Richard_Ngo
5mo
4
162
The Solomonoff Prior is Malign
Mark Xu
2y
52
175
Inner Alignment: Explain like I'm 12 Edition
Rafael Harth
2y
46
122
A Semitechnical Introductory Dialogue on Solomonoff Induction
Eliezer Yudkowsky
1y
34
29
Clarifying the confusion around inner alignment
Rauno Arike
7mo
0
71
Empirical Observations of Objective Robustness Failures
jbkjr
1y
5