Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
5 posts
Shard Theory
Internal Alignment (Human)
22 posts
Complexity of Value
Value Drift
Ontology
Whole Brain Emulation
Motivations
LessWrong Review
Psychology
Futurism
General Alignment Properties
Superstimuli
202
The shard theory of human values
Quintin Pope
3mo
57
84
Contra shard theory, in the context of the diamond maximizer problem
So8res
2mo
16
70
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
1d
9
42
Positive values seem more robust and lasting than prohibitions
TurnTrout
3d
9
29
Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight
Jacy Reese Anthis
1mo
8
175
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
130
Shard Theory: An Overview
David Udell
4mo
34
85
Two Neglected Problems in Human-AI Safety
Wei_Dai
4y
24
73
But exactly how complex and fragile?
KatjaGrace
3y
32
69
The two-layer model of human values, and problems with synthesizing preferences
Kaj_Sotala
2y
16
68
Three AI Safety Related Ideas
Wei_Dai
4y
38
65
Why we need a *theory* of human values
Stuart_Armstrong
4y
15
58
Test Cases for Impact Regularisation Methods
DanielFilan
3y
5
55
Review of 'But exactly how complex and fragile?'
TurnTrout
1y
0
55
Alignment allows "nonrobust" decision-influences and doesn't require robust grading
TurnTrout
21d
27
46
General alignment properties
TurnTrout
4mo
2
40
Understanding and avoiding value drift
TurnTrout
3mo
9
36
Broad Picture of Human Values
Thane Ruthenis
4mo
5
35
Can there be an indescribable hellworld?
Stuart_Armstrong
3y
19