Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
125 posts
World Modeling
Impact Regularization
Anthropics
Exercises / Problem-Sets
Fixed Point Theorems
AIXI
Updateless Decision Theory
Sleeping Beauty Paradox
Cognitive Science
Extraterrestrial Life
Economics
Grabby Aliens
47 posts
Human Values
Shard Theory
Complexity of Value
Gradient Hacking
Heuristics & Biases
Evolution
Value Drift
Information Theory
Gradient Descent
Modularity
Ontology
Biology
43
Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide
Andrew_Critch
26d
34
24
Traps of Formalization in Deconfusion
adamShimi
1y
7
56
«Boundaries», Part 3a: Defining boundaries as directed Markov blankets
Andrew_Critch
1mo
13
310
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
51
Humans do acausal coordination all the time
Adam Jermyn
1mo
36
69
Less Threat-Dependent Bargaining Solutions?? (3/2)
Diffractor
4mo
7
145
Testing The Natural Abstraction Hypothesis: Project Intro
johnswentworth
1y
34
777
Where I agree and disagree with Eliezer
paulfchristiano
6mo
205
61
Attainable Utility Preservation: Empirical Results
TurnTrout
2y
8
14
Deliberation Everywhere: Simple Examples
Oliver Sourbut
5mo
0
14
Using modal fixed points to formalize logical causality
cousin_it
5y
0
8
An implementation of modal UDT
Benya_Fallenstein
7y
0
0
Corrigibility for AIXI via double indifference
Stuart_Armstrong
6y
0
11
Updatelessness and Son of X
Scott Garrabrant
6y
0
70
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
1d
9
42
Positive values seem more robust and lasting than prohibitions
TurnTrout
3d
9
55
Alignment allows "nonrobust" decision-influences and doesn't require robust grading
TurnTrout
21d
27
29
Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight
Jacy Reese Anthis
1mo
8
202
The shard theory of human values
Quintin Pope
3mo
57
175
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
92
Human values & biases are inaccessible to the genome
TurnTrout
5mo
51
40
Understanding and avoiding value drift
TurnTrout
3mo
9
52
Gradient Hacker Design Principles From Biology
johnswentworth
3mo
13
183
Utility Maximization = Description Length Minimization
johnswentworth
1y
40
130
Shard Theory: An Overview
David Udell
4mo
34
84
Contra shard theory, in the context of the diamond maximizer problem
So8res
2mo
16
33
How are you dealing with ontology identification?
Erik Jenner
2mo
10
46
General alignment properties
TurnTrout
4mo
2