Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
20 posts
Gradient Hacking
Evolution
Heuristics & Biases
Modularity
Information Theory
Human Values
Gradient Descent
Biology
Experiments
Cultural knowledge
Aesthetics
Request Post
27 posts
Complexity of Value
Shard Theory
Value Drift
Ontology
Whole Brain Emulation
Motivations
LessWrong Review
Psychology
Futurism
General Alignment Properties
Internal Alignment (Human)
Superstimuli
170
Utility Maximization = Description Length Minimization
johnswentworth
1y
40
155
Evolution of Modularity
johnswentworth
3y
12
117
A broad basin of attraction around human values?
Wei_Dai
8mo
16
92
Human values & biases are inaccessible to the genome
TurnTrout
5mo
51
91
The Telephone Theorem: Information At A Distance Is Mediated By Deterministic Constraints
johnswentworth
1y
21
70
Gradient descent is not just more efficient genetic algorithms
leogao
1y
14
55
Gradient Hacker Design Principles From Biology
johnswentworth
3mo
13
49
Ten experiments in modularity, which we'd like you to run!
TheMcDouglas
6mo
2
48
The Blackwell order as a formalization of knowledge
Alex Flint
1y
10
41
Emergent modularity and safety
Richard_Ngo
1y
15
37
Conditions for mathematical equivalence of Stochastic Gradient Descent and Natural Selection
Oliver Sourbut
7mo
12
36
Theories of Modularity in the Biological Literature
TheMcDouglas
8mo
13
31
Gradient hacking: definitions and examples
Richard_Ngo
5mo
1
31
Hypothesis: gradient descent prefers general circuits
Quintin Pope
10mo
26
159
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
155
The shard theory of human values
Quintin Pope
3mo
57
140
Shard Theory: An Overview
David Udell
4mo
34
99
Two Neglected Problems in Human-AI Safety
Wei_Dai
4y
24
95
Contra shard theory, in the context of the diamond maximizer problem
So8res
2mo
16
88
The two-layer model of human values, and problems with synthesizing preferences
Kaj_Sotala
2y
16
86
But exactly how complex and fragile?
KatjaGrace
3y
32
78
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
1d
9
78
Three AI Safety Related Ideas
Wei_Dai
4y
38
73
Review of 'But exactly how complex and fragile?'
TurnTrout
1y
0
71
Test Cases for Impact Regularisation Methods
DanielFilan
3y
5
67
Why we need a *theory* of human values
Stuart_Armstrong
4y
15
58
Alignment allows "nonrobust" decision-influences and doesn't require robust grading
TurnTrout
21d
27
51
General alignment properties
TurnTrout
4mo
2