Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
177 posts
Rationality
Decision Theory
Abstraction
Goal-Directedness
Utility Functions
Finite Factored Sets
Causality
Literature Reviews
Quantilization
Mild Optimization
Open Problems
Filtered Evidence
172 posts
World Modeling
Impact Regularization
Human Values
Shard Theory
Anthropics
Complexity of Value
Exercises / Problem-Sets
Gradient Hacking
Evolution
Fixed Point Theorems
Heuristics & Biases
Modularity
148
Finite Factored Sets in Pictures
Magdalena Wache
9d
29
92
wrapper-minds are the enemy
nostalgebraist
6mo
36
70
Builder/Breaker for Deconfusion
abramdemski
2mo
9
137
Finite Factored Sets
Scott Garrabrant
1y
94
147
Can you control the past?
Joe Carlsmith
1y
93
36
Counterfactability
Scott Garrabrant
1mo
4
47
Take 7: You should talk about "the human's utility function" less.
Charlie Steiner
12d
22
115
Principles for Alignment/Agency Projects
johnswentworth
5mo
20
24
Quantilizers and Generative Models
Adam Jermyn
5mo
5
55
Notes on "Can you control the past"
So8res
2mo
40
52
All the posts I will never write
Alexander Gietelink Oldenziel
4mo
8
81
[Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA
Steven Byrnes
7mo
11
21
Exploring Mild Behaviour in Embedded Agents
Megan Kinniment
5mo
3
83
Testing The Natural Abstraction Hypothesis: Project Update
johnswentworth
1y
17
70
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
1d
9
42
Positive values seem more robust and lasting than prohibitions
TurnTrout
3d
9
55
Alignment allows "nonrobust" decision-influences and doesn't require robust grading
TurnTrout
21d
27
43
Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide
Andrew_Critch
26d
34
29
Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight
Jacy Reese Anthis
1mo
8
202
The shard theory of human values
Quintin Pope
3mo
57
24
Traps of Formalization in Deconfusion
adamShimi
1y
7
175
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
56
«Boundaries», Part 3a: Defining boundaries as directed Markov blankets
Andrew_Critch
1mo
13
92
Human values & biases are inaccessible to the genome
TurnTrout
5mo
51
310
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
51
Humans do acausal coordination all the time
Adam Jermyn
1mo
36
40
Understanding and avoiding value drift
TurnTrout
3mo
9
52
Gradient Hacker Design Principles From Biology
johnswentworth
3mo
13