Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
177 posts
Rationality
Decision Theory
Abstraction
Goal-Directedness
Utility Functions
Finite Factored Sets
Causality
Literature Reviews
Quantilization
Mild Optimization
Open Problems
Filtered Evidence
172 posts
World Modeling
Impact Regularization
Human Values
Shard Theory
Anthropics
Complexity of Value
Exercises / Problem-Sets
Gradient Hacking
Evolution
Fixed Point Theorems
Heuristics & Biases
Modularity
125
Finite Factored Sets in Pictures
Magdalena Wache
9d
29
93
wrapper-minds are the enemy
nostalgebraist
6mo
36
78
Builder/Breaker for Deconfusion
abramdemski
2mo
9
133
Finite Factored Sets
Scott Garrabrant
1y
94
134
Can you control the past?
Joe Carlsmith
1y
93
40
Counterfactability
Scott Garrabrant
1mo
4
59
Take 7: You should talk about "the human's utility function" less.
Charlie Steiner
12d
22
122
Principles for Alignment/Agency Projects
johnswentworth
5mo
20
28
Quantilizers and Generative Models
Adam Jermyn
5mo
5
64
Notes on "Can you control the past"
So8res
2mo
40
35
All the posts I will never write
Alexander Gietelink Oldenziel
4mo
8
74
[Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA
Steven Byrnes
7mo
11
10
Exploring Mild Behaviour in Embedded Agents
Megan Kinniment
5mo
3
80
Testing The Natural Abstraction Hypothesis: Project Update
johnswentworth
1y
17
78
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
1d
9
46
Positive values seem more robust and lasting than prohibitions
TurnTrout
3d
9
58
Alignment allows "nonrobust" decision-influences and doesn't require robust grading
TurnTrout
21d
27
48
Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide
Andrew_Critch
26d
34
26
Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight
Jacy Reese Anthis
1mo
8
155
The shard theory of human values
Quintin Pope
3mo
57
28
Traps of Formalization in Deconfusion
adamShimi
1y
7
159
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
75
«Boundaries», Part 3a: Defining boundaries as directed Markov blankets
Andrew_Critch
1mo
13
92
Human values & biases are inaccessible to the genome
TurnTrout
5mo
51
239
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
67
Humans do acausal coordination all the time
Adam Jermyn
1mo
36
47
Understanding and avoiding value drift
TurnTrout
3mo
9
55
Gradient Hacker Design Principles From Biology
johnswentworth
3mo
13