Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
71 posts
Outer Alignment
Optimization
Mesa-Optimization
Neuroscience
Neuromorphic AI
General Intelligence
Predictive Processing
AI Services (CAIS)
Selection vs Control
Neocortex
Distinctions
Computing Overhang
47 posts
Inner Alignment
Solomonoff Induction
Priors
Occam's Razor
32
Don't align agents to evaluations of plans
TurnTrout
24d
46
56
Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)
LawrenceC
4d
10
47
My take on Jacob Cannell’s take on AGI safety
Steven Byrnes
22d
13
8
Take 6: CAIS is actually Orwellian.
Charlie Steiner
13d
5
111
What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?
johnswentworth
4mo
15
45
Humans aren't fitness maximizers
So8res
2mo
45
65
Human Mimicry Mainly Works When We’re Already Close
johnswentworth
4mo
16
8
Inner alignment: what are we pointing at?
lcmgcd
3mo
2
49
Mesa-Optimizers vs “Steered Optimizers”
Steven Byrnes
2y
7
69
Bottle Caps Aren't Optimisers
DanielFilan
4y
21
16
Outer alignment and imitative amplification
evhub
2y
11
59
Multi-agent predictive minds and AI alignment
Jan_Kulveit
4y
18
71
Risks from Learned Optimization: Conclusion and Related Work
evhub
3y
4
40
The Steering Problem
paulfchristiano
4y
12
102
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout
18d
18
20
Value Formation: An Overarching Model
Thane Ruthenis
1mo
6
26
Applications for Deconfusing Goal-Directedness
adamShimi
1y
3
28
Mesa-Optimizers via Grokking
orthonormal
14d
4
65
Threat Model Literature Review
zac_kenton
1mo
4
127
Externalized reasoning oversight: a research direction for language model alignment
tamera
4mo
22
41
Framing AI Childhoods
David Udell
3mo
8
42
Outer vs inner misalignment: three framings
Richard_Ngo
5mo
4
43
Towards an empirical investigation of inner alignment
evhub
3y
9
77
2-D Robustness
vlad_m
3y
8
1
Simplicity priors with reflective oracles
Benya_Fallenstein
8y
0
16
The universal prior is malign
paulfchristiano
6y
0
20
Inner alignment requires making assumptions about human values
Matthew Barnett
2y
9
16
Re-Define Intent Alignment?
abramdemski
1y
33