Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
154 posts
Inner Alignment
Neuroscience
Outer Alignment
Mesa-Optimization
Predictive Processing
Neuromorphic AI
Brain-Computer Interfaces
Neocortex
Neuralink
Systems Thinking
Emergent Behavior ( Emergence )
148 posts
Goodhart's Law
Optimization
General Intelligence
AI Services (CAIS)
Adaptation Executors
Superstimuli
Narrow AI
Hope
Selection vs Control
Delegation
59
Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)
LawrenceC
4d
10
28
Predictive Processing, Heterosexuality and Delusions of Grandeur
lsusr
3d
2
108
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout
18d
18
24
Take 8: Queer the inner/outer alignment dichotomy.
Charlie Steiner
11d
2
50
My take on Jacob Cannell’s take on AGI safety
Steven Byrnes
22d
13
29
Mesa-Optimizers via Grokking
orthonormal
14d
4
93
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
39
[Hebbian Natural Abstractions] Introduction
Samuel Nellessen
29d
3
33
Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight
Jacy Reese Anthis
1mo
8
28
The Disastrously Confident And Inaccurate AI
Sharat Jacob Jacob
1mo
0
43
AI researchers announce NeuroAI agenda
Cameron Berg
1mo
12
84
How likely is deceptive alignment?
evhub
3mo
21
21
Value Formation: An Overarching Model
Thane Ruthenis
1mo
6
20
I there a demo of "You can't fetch the coffee if you're dead"?
Ram Rachum
1mo
9
55
Alignment allows "nonrobust" decision-influences and doesn't require robust grading
TurnTrout
21d
27
72
Don't design agents which exploit adversarial inputs
TurnTrout
1mo
61
33
Don't align agents to evaluations of plans
TurnTrout
24d
46
75
"Normal" is the equilibrium state of past optimization processes
Alex_Altair
1mo
5
31
The economy as an analogy for advanced AI systems
rosehadshar
1mo
0
117
What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?
johnswentworth
4mo
15
8
Take 6: CAIS is actually Orwellian.
Charlie Steiner
13d
5
47
Humans aren't fitness maximizers
So8res
2mo
45
27
The reward function is already how well you manipulate humans
Kerry
2mo
9
53
Vingean Agency
abramdemski
3mo
13
57
[Yann Lecun] A Path Towards Autonomous Machine Intelligence
DragonGod
5mo
12
26
program searches
carado
3mo
2
207
Utility Maximization = Description Length Minimization
johnswentworth
1y
40
48
I No Longer Believe Intelligence to be "Magical"
DragonGod
6mo
34