Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

42 posts Value Learning The Pointers Problem Kolmogorov Complexity

14 posts Metaethics Meta-Philosophy Philosophy Perceptual Control Theory

67 Don't design agents which exploit adversarial inputs

TurnTrout

1mo

61

25 People care about each other even though they have imperfect motivational pointers?

TurnTrout

1mo

25

10 Stable Pointers to Value: An Agent Embedded in Its Own Utility Function

abramdemski

5y

9

80 Beyond Kolmogorov and Shannon

Alexander Gietelink Oldenziel

1mo

14

109 The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

2y

43

58 Humans can be assigned any values whatsoever…

Stuart_Armstrong

4y

26

10 AIs should learn human preferences, not biases

Stuart_Armstrong

8mo

1

31 Human-AI Interaction

Rohin Shah

3y

10

1 Humans can be assigned any values whatsoever...

Stuart_Armstrong

5y

0

0 Kolmogorov complexity makes reward learning worse

Stuart_Armstrong

5y

0

44 What is ambitious value learning?

Rohin Shah

4y

28

68 Parsing Chris Mingard on Neural Networks

Alex Flint

1y

27

13 Morally underdefined situations can be deadly

Stuart_Armstrong

1y

8

21 Thoughts on implementing corrigible robust alignment

Steven Byrnes

3y

2

45 What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment

xuan

3mo

15

16 RFC: Philosophical Conservatism in AI Alignment Research

Gordon Seidoh Worley

4y

13

29 Gricean communication and meta-preferences

Charlie Steiner

2y

0

14 Meta-preferences two ways: generator vs. patch

Charlie Steiner

2y

0

48 Some Thoughts on Metaphilosophy

Wei_Dai

3y

27

17 The Value Definition Problem

Sammy Martin

3y

6

20 Recursive Quantilizers II

abramdemski

2y

15

25 Deliberation as a method to find the "actual preferences" of humans

riceissa

3y

5

32 AI Alignment, Philosophical Pluralism, and the Relevance of Non-Western Philosophy

xuan

1y

21

24 Deconfusing Human Values Research Agenda v1

Gordon Seidoh Worley

2y

12

16 Impossible moral problems and moral authority

Charlie Steiner

3y

8

25 A theory of human values

Stuart_Armstrong

3y

13

11 Can we make peace with moral indeterminacy?

Charlie Steiner

3y

8

15 My take on agent foundations: formalizing metaphilosophical competence

zhukeepa

4y

6