Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
17 posts
Complexity of Value
Value Drift
Whole Brain Emulation
Motivations
LessWrong Review
Psychology
Futurism
Superstimuli
5 posts
Ontology
General Alignment Properties
120
Shard Theory: An Overview
David Udell
4mo
34
71
Two Neglected Problems in Human-AI Safety
Wei_Dai
4y
24
63
Why we need a *theory* of human values
Stuart_Armstrong
4y
15
60
But exactly how complex and fragile?
KatjaGrace
3y
32
58
Three AI Safety Related Ideas
Wei_Dai
4y
38
52
Alignment allows "nonrobust" decision-influences and doesn't require robust grading
TurnTrout
21d
27
50
The two-layer model of human values, and problems with synthesizing preferences
Kaj_Sotala
2y
16
47
Acknowledging Human Preference Types to Support Value Learning
Nandi Sabrina Erin
4y
4
37
Review of 'But exactly how complex and fragile?'
TurnTrout
1y
0
34
Broad Picture of Human Values
Thane Ruthenis
4mo
5
33
Understanding and avoiding value drift
TurnTrout
3mo
9
28
Can there be an indescribable hellworld?
Stuart_Armstrong
3y
19
23
Reversible changes: consider a bucket of water
Stuart_Armstrong
3y
18
15
Would I think for ten thousand years?
Stuart_Armstrong
3y
13
191
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
45
Test Cases for Impact Regularisation Methods
DanielFilan
3y
5
41
General alignment properties
TurnTrout
4mo
2
31
How are you dealing with ontology identification?
Erik Jenner
2mo
10
7
A sketch of a value-learning sovereign
jessicata
7y
0