Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
71 posts
Outer Alignment
Optimization
Mesa-Optimization
Neuroscience
Neuromorphic AI
General Intelligence
Predictive Processing
AI Services (CAIS)
Selection vs Control
Neocortex
Distinctions
Computing Overhang
47 posts
Inner Alignment
Solomonoff Induction
Priors
Occam's Razor
37
Don't align agents to evaluations of plans
TurnTrout
24d
46
60
Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)
LawrenceC
4d
10
61
My take on Jacob Cannell’s take on AGI safety
Steven Byrnes
22d
13
14
Take 6: CAIS is actually Orwellian.
Charlie Steiner
13d
5
103
What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?
johnswentworth
4mo
15
52
Humans aren't fitness maximizers
So8res
2mo
45
68
Human Mimicry Mainly Works When We’re Already Close
johnswentworth
4mo
16
7
Inner alignment: what are we pointing at?
lcmgcd
3mo
2
45
Mesa-Optimizers vs “Steered Optimizers”
Steven Byrnes
2y
7
79
Bottle Caps Aren't Optimisers
DanielFilan
4y
21
24
Outer alignment and imitative amplification
evhub
2y
11
60
Multi-agent predictive minds and AI alignment
Jan_Kulveit
4y
18
78
Risks from Learned Optimization: Conclusion and Related Work
evhub
3y
4
43
The Steering Problem
paulfchristiano
4y
12
96
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout
18d
18
20
Value Formation: An Overarching Model
Thane Ruthenis
1mo
6
36
Applications for Deconfusing Goal-Directedness
adamShimi
1y
3
35
Mesa-Optimizers via Grokking
orthonormal
14d
4
55
Threat Model Literature Review
zac_kenton
1mo
4
103
Externalized reasoning oversight: a research direction for language model alignment
tamera
4mo
22
37
Framing AI Childhoods
David Udell
3mo
8
43
Outer vs inner misalignment: three framings
Richard_Ngo
5mo
4
44
Towards an empirical investigation of inner alignment
evhub
3y
9
77
2-D Robustness
vlad_m
3y
8
1
Simplicity priors with reflective oracles
Benya_Fallenstein
8y
0
16
The universal prior is malign
paulfchristiano
6y
0
26
Inner alignment requires making assumptions about human values
Matthew Barnett
2y
9
27
Re-Define Intent Alignment?
abramdemski
1y
33