Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
125 posts
World Modeling
Impact Regularization
Anthropics
Exercises / Problem-Sets
Fixed Point Theorems
AIXI
Updateless Decision Theory
Sleeping Beauty Paradox
Cognitive Science
Extraterrestrial Life
Economics
Grabby Aliens
47 posts
Human Values
Shard Theory
Complexity of Value
Gradient Hacking
Heuristics & Biases
Evolution
Value Drift
Information Theory
Gradient Descent
Modularity
Ontology
Biology
573
Where I agree and disagree with Eliezer
paulfchristiano
6mo
205
239
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
159
My research methodology
paulfchristiano
1y
36
159
Testing The Natural Abstraction Hypothesis: Project Intro
johnswentworth
1y
34
145
Fixing The Good Regulator Theorem
johnswentworth
1y
25
100
Selection Theorems: A Program For Understanding Agents
johnswentworth
1y
23
98
There is essentially one best-validated theory of cognition.
abramdemski
1y
34
97
Frequent arguments about alignment
John Schulman
1y
16
84
Less Threat-Dependent Bargaining Solutions?? (3/2)
Diffractor
4mo
7
83
The Goldbach conjecture is probably correct; so was Fermat's last theorem
Stuart_Armstrong
2y
27
83
Reframing Impact
TurnTrout
3y
15
77
Deducing Impact
TurnTrout
3y
26
77
Abstractions as Redundant Information
johnswentworth
10mo
7
75
«Boundaries», Part 3a: Defining boundaries as directed Markov blankets
Andrew_Critch
1mo
13
170
Utility Maximization = Description Length Minimization
johnswentworth
1y
40
159
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
155
Evolution of Modularity
johnswentworth
3y
12
155
The shard theory of human values
Quintin Pope
3mo
57
140
Shard Theory: An Overview
David Udell
4mo
34
117
A broad basin of attraction around human values?
Wei_Dai
8mo
16
99
Two Neglected Problems in Human-AI Safety
Wei_Dai
4y
24
95
Contra shard theory, in the context of the diamond maximizer problem
So8res
2mo
16
92
Human values & biases are inaccessible to the genome
TurnTrout
5mo
51
91
The Telephone Theorem: Information At A Distance Is Mediated By Deterministic Constraints
johnswentworth
1y
21
88
The two-layer model of human values, and problems with synthesizing preferences
Kaj_Sotala
2y
16
86
But exactly how complex and fragile?
KatjaGrace
3y
32
78
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
1d
9
78
Three AI Safety Related Ideas
Wei_Dai
4y
38