Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
20 posts
Gradient Hacking
Evolution
Heuristics & Biases
Modularity
Information Theory
Human Values
Gradient Descent
Biology
Experiments
Cultural knowledge
Aesthetics
Request Post
27 posts
Complexity of Value
Shard Theory
Value Drift
Ontology
Whole Brain Emulation
Motivations
LessWrong Review
Psychology
Futurism
General Alignment Properties
Internal Alignment (Human)
Superstimuli
92
Human values & biases are inaccessible to the genome
TurnTrout
5mo
51
49
Gradient Hacker Design Principles From Biology
johnswentworth
3mo
13
196
Utility Maximization = Description Length Minimization
johnswentworth
1y
40
81
The Telephone Theorem: Information At A Distance Is Mediated By Deterministic Constraints
johnswentworth
1y
21
13
Gradient hacking: definitions and examples
Richard_Ngo
5mo
1
71
Conditions for mathematical equivalence of Stochastic Gradient Descent and Natural Selection
Oliver Sourbut
7mo
12
1
Speculations on information under logical uncertainty
TsviBT
6y
0
69
Ten experiments in modularity, which we'd like you to run!
TheMcDouglas
6mo
2
15
How to Throw Away Information
johnswentworth
3y
5
163
Evolution of Modularity
johnswentworth
3y
12
49
Hypothesis: gradient descent prefers general circuits
Quintin Pope
10mo
26
58
Theories of Modularity in the Biological Literature
TheMcDouglas
8mo
13
16
Musings on Cumulative Cultural Evolution and AI
calebo
3y
5
12
Preference synthesis illustrated: Star Wars
Stuart_Armstrong
2y
8
62
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
1d
9
38
Positive values seem more robust and lasting than prohibitions
TurnTrout
3d
9
52
Alignment allows "nonrobust" decision-influences and doesn't require robust grading
TurnTrout
21d
27
32
Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight
Jacy Reese Anthis
1mo
8
249
The shard theory of human values
Quintin Pope
3mo
57
191
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
33
Understanding and avoiding value drift
TurnTrout
3mo
9
120
Shard Theory: An Overview
David Udell
4mo
34
73
Contra shard theory, in the context of the diamond maximizer problem
So8res
2mo
16
31
How are you dealing with ontology identification?
Erik Jenner
2mo
10
41
General alignment properties
TurnTrout
4mo
2
50
The two-layer model of human values, and problems with synthesizing preferences
Kaj_Sotala
2y
16
2
Chatbots or set answers, not WBEs
Stuart_Armstrong
7y
0
7
A sketch of a value-learning sovereign
jessicata
7y
0