Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
42 posts
Value Learning
The Pointers Problem
Kolmogorov Complexity
14 posts
Metaethics
Meta-Philosophy
Philosophy
Perceptual Control Theory
67
Don't design agents which exploit adversarial inputs
TurnTrout
1mo
61
25
People care about each other even though they have imperfect motivational pointers?
TurnTrout
1mo
25
10
Stable Pointers to Value: An Agent Embedded in Its Own Utility Function
abramdemski
5y
9
80
Beyond Kolmogorov and Shannon
Alexander Gietelink Oldenziel
1mo
14
109
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
johnswentworth
2y
43
58
Humans can be assigned any values whatsoever…
Stuart_Armstrong
4y
26
10
AIs should learn human preferences, not biases
Stuart_Armstrong
8mo
1
31
Human-AI Interaction
Rohin Shah
3y
10
1
Humans can be assigned any values whatsoever...
Stuart_Armstrong
5y
0
0
Kolmogorov complexity makes reward learning worse
Stuart_Armstrong
5y
0
44
What is ambitious value learning?
Rohin Shah
4y
28
68
Parsing Chris Mingard on Neural Networks
Alex Flint
1y
27
13
Morally underdefined situations can be deadly
Stuart_Armstrong
1y
8
21
Thoughts on implementing corrigible robust alignment
Steven Byrnes
3y
2
45
What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment
xuan
3mo
15
16
RFC: Philosophical Conservatism in AI Alignment Research
Gordon Seidoh Worley
4y
13
29
Gricean communication and meta-preferences
Charlie Steiner
2y
0
14
Meta-preferences two ways: generator vs. patch
Charlie Steiner
2y
0
48
Some Thoughts on Metaphilosophy
Wei_Dai
3y
27
17
The Value Definition Problem
Sammy Martin
3y
6
20
Recursive Quantilizers II
abramdemski
2y
15
25
Deliberation as a method to find the "actual preferences" of humans
riceissa
3y
5
32
AI Alignment, Philosophical Pluralism, and the Relevance of Non-Western Philosophy
xuan
1y
21
24
Deconfusing Human Values Research Agenda v1
Gordon Seidoh Worley
2y
12
16
Impossible moral problems and moral authority
Charlie Steiner
3y
8
25
A theory of human values
Stuart_Armstrong
3y
13
11
Can we make peace with moral indeterminacy?
Charlie Steiner
3y
8
15
My take on agent foundations: formalizing metaphilosophical competence
zhukeepa
4y
6