Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
177 posts
Rationality
Decision Theory
Abstraction
Goal-Directedness
Utility Functions
Finite Factored Sets
Causality
Literature Reviews
Quantilization
Mild Optimization
Open Problems
Filtered Evidence
172 posts
World Modeling
Impact Regularization
Human Values
Shard Theory
Anthropics
Complexity of Value
Exercises / Problem-Sets
Gradient Hacking
Evolution
Fixed Point Theorems
Heuristics & Biases
Modularity
171
Finite Factored Sets in Pictures
Magdalena Wache
9d
29
91
wrapper-minds are the enemy
nostalgebraist
6mo
36
62
Builder/Breaker for Deconfusion
abramdemski
2mo
9
141
Finite Factored Sets
Scott Garrabrant
1y
94
160
Can you control the past?
Joe Carlsmith
1y
93
32
Counterfactability
Scott Garrabrant
1mo
4
35
Take 7: You should talk about "the human's utility function" less.
Charlie Steiner
12d
22
108
Principles for Alignment/Agency Projects
johnswentworth
5mo
20
20
Quantilizers and Generative Models
Adam Jermyn
5mo
5
46
Notes on "Can you control the past"
So8res
2mo
40
69
All the posts I will never write
Alexander Gietelink Oldenziel
4mo
8
88
[Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA
Steven Byrnes
7mo
11
32
Exploring Mild Behaviour in Embedded Agents
Megan Kinniment
5mo
3
86
Testing The Natural Abstraction Hypothesis: Project Update
johnswentworth
1y
17
62
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
1d
9
38
Positive values seem more robust and lasting than prohibitions
TurnTrout
3d
9
52
Alignment allows "nonrobust" decision-influences and doesn't require robust grading
TurnTrout
21d
27
38
Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide
Andrew_Critch
26d
34
32
Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight
Jacy Reese Anthis
1mo
8
249
The shard theory of human values
Quintin Pope
3mo
57
20
Traps of Formalization in Deconfusion
adamShimi
1y
7
191
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
37
«Boundaries», Part 3a: Defining boundaries as directed Markov blankets
Andrew_Critch
1mo
13
92
Human values & biases are inaccessible to the genome
TurnTrout
5mo
51
381
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
35
Humans do acausal coordination all the time
Adam Jermyn
1mo
36
33
Understanding and avoiding value drift
TurnTrout
3mo
9
49
Gradient Hacker Design Principles From Biology
johnswentworth
3mo
13