Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
177 posts
Rationality
Decision Theory
Abstraction
Goal-Directedness
Utility Functions
Finite Factored Sets
Causality
Literature Reviews
Quantilization
Mild Optimization
Open Problems
Filtered Evidence
172 posts
World Modeling
Impact Regularization
Human Values
Shard Theory
Anthropics
Complexity of Value
Exercises / Problem-Sets
Gradient Hacking
Evolution
Fixed Point Theorems
Heuristics & Biases
Modularity
201
What's Up With Confusingly Pervasive Consequentialism?
Raemon
11mo
88
154
Realism about rationality
Richard_Ngo
4y
145
153
2021 AI Alignment Literature Review and Charity Comparison
Larks
12mo
26
146
Saving Time
Scott Garrabrant
1y
19
137
2020 AI Alignment Literature Review and Charity Comparison
Larks
1y
14
134
Can you control the past?
Joe Carlsmith
1y
93
133
Finite Factored Sets
Scott Garrabrant
1y
94
126
An Orthodox Case Against Utility Functions
abramdemski
2y
53
125
Finite Factored Sets in Pictures
Magdalena Wache
9d
29
122
Principles for Alignment/Agency Projects
johnswentworth
5mo
20
106
Thinking About Filtered Evidence Is (Very!) Hard
abramdemski
2y
29
105
Problem relaxation as a tactic
TurnTrout
2y
8
105
Coherence arguments do not entail goal-directed behavior
Rohin Shah
4y
69
105
Utility ≠ Reward
vlad_m
3y
25
573
Where I agree and disagree with Eliezer
paulfchristiano
6mo
205
239
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
170
Utility Maximization = Description Length Minimization
johnswentworth
1y
40
159
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
159
My research methodology
paulfchristiano
1y
36
159
Testing The Natural Abstraction Hypothesis: Project Intro
johnswentworth
1y
34
155
Evolution of Modularity
johnswentworth
3y
12
155
The shard theory of human values
Quintin Pope
3mo
57
145
Fixing The Good Regulator Theorem
johnswentworth
1y
25
140
Shard Theory: An Overview
David Udell
4mo
34
117
A broad basin of attraction around human values?
Wei_Dai
8mo
16
100
Selection Theorems: A Program For Understanding Agents
johnswentworth
1y
23
99
Two Neglected Problems in Human-AI Safety
Wei_Dai
4y
24
98
There is essentially one best-validated theory of cognition.
abramdemski
1y
34