Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
177 posts
Rationality
Decision Theory
Abstraction
Goal-Directedness
Utility Functions
Finite Factored Sets
Causality
Literature Reviews
Quantilization
Mild Optimization
Open Problems
Filtered Evidence
172 posts
World Modeling
Impact Regularization
Human Values
Shard Theory
Anthropics
Complexity of Value
Exercises / Problem-Sets
Gradient Hacking
Evolution
Fixed Point Theorems
Heuristics & Biases
Modularity
171
Finite Factored Sets in Pictures
Magdalena Wache
9d
29
35
Take 7: You should talk about "the human's utility function" less.
Charlie Steiner
12d
22
32
Counterfactability
Scott Garrabrant
1mo
4
46
Notes on "Can you control the past"
So8res
2mo
40
62
Builder/Breaker for Deconfusion
abramdemski
2mo
9
146
why assume AGIs will optimize for fixed goals?
nostalgebraist
6mo
52
108
Principles for Alignment/Agency Projects
johnswentworth
5mo
20
69
All the posts I will never write
Alexander Gietelink Oldenziel
4mo
8
60
Finding Goals in the World Model
Jeremy Gillen
4mo
8
91
wrapper-minds are the enemy
nostalgebraist
6mo
36
175
2021 AI Alignment Literature Review and Charity Comparison
Larks
12mo
26
88
[Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA
Steven Byrnes
7mo
11
137
What's Up With Confusingly Pervasive Consequentialism?
Raemon
11mo
88
71
Open Problems in AI X-Risk [PAIS #5]
Dan H
6mo
3
62
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
1d
9
38
Positive values seem more robust and lasting than prohibitions
TurnTrout
3d
9
981
Where I agree and disagree with Eliezer
paulfchristiano
6mo
205
52
Alignment allows "nonrobust" decision-influences and doesn't require robust grading
TurnTrout
21d
27
381
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
249
The shard theory of human values
Quintin Pope
3mo
57
38
Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide
Andrew_Critch
26d
34
12
Working towards AI alignment is better
Johannes C. Mayer
11d
2
73
Contra shard theory, in the context of the diamond maximizer problem
So8res
2mo
16
191
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
32
Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight
Jacy Reese Anthis
1mo
8
120
Shard Theory: An Overview
David Udell
4mo
34
35
Humans do acausal coordination all the time
Adam Jermyn
1mo
36
37
«Boundaries», Part 3a: Defining boundaries as directed Markov blankets
Andrew_Critch
1mo
13