Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
125 posts
World Modeling
Impact Regularization
Anthropics
Exercises / Problem-Sets
Fixed Point Theorems
AIXI
Updateless Decision Theory
Sleeping Beauty Paradox
Cognitive Science
Extraterrestrial Life
Economics
Grabby Aliens
47 posts
Human Values
Shard Theory
Complexity of Value
Gradient Hacking
Heuristics & Biases
Evolution
Value Drift
Information Theory
Gradient Descent
Modularity
Ontology
Biology
777
Where I agree and disagree with Eliezer
paulfchristiano
6mo
205
310
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
148
My research methodology
paulfchristiano
1y
36
145
Testing The Natural Abstraction Hypothesis: Project Intro
johnswentworth
1y
34
123
Fixing The Good Regulator Theorem
johnswentworth
1y
25
103
Selection Theorems: A Program For Understanding Agents
johnswentworth
1y
23
100
Towards a New Impact Measure
TurnTrout
4y
159
95
Frequent arguments about alignment
John Schulman
1y
16
90
Reframing Impact
TurnTrout
3y
15
88
There is essentially one best-validated theory of cognition.
abramdemski
1y
34
73
Worrying about the Vase: Whitelisting
TurnTrout
4y
26
73
The Goldbach conjecture is probably correct; so was Fermat's last theorem
Stuart_Armstrong
2y
27
70
Topological Fixed Point Exercises
Scott Garrabrant
4y
52
69
Less Threat-Dependent Bargaining Solutions?? (3/2)
Diffractor
4mo
7
202
The shard theory of human values
Quintin Pope
3mo
57
183
Utility Maximization = Description Length Minimization
johnswentworth
1y
40
175
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
159
Evolution of Modularity
johnswentworth
3y
12
130
Shard Theory: An Overview
David Udell
4mo
34
105
A broad basin of attraction around human values?
Wei_Dai
8mo
16
92
Human values & biases are inaccessible to the genome
TurnTrout
5mo
51
86
The Telephone Theorem: Information At A Distance Is Mediated By Deterministic Constraints
johnswentworth
1y
21
85
Two Neglected Problems in Human-AI Safety
Wei_Dai
4y
24
84
Contra shard theory, in the context of the diamond maximizer problem
So8res
2mo
16
73
But exactly how complex and fragile?
KatjaGrace
3y
32
70
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
1d
9
69
The two-layer model of human values, and problems with synthesizing preferences
Kaj_Sotala
2y
16
68
Three AI Safety Related Ideas
Wei_Dai
4y
38