Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

56 posts Value Learning The Pointers Problem Meta-Philosophy Metaethics Kolmogorov Complexity Philosophy Perceptual Control Theory

11 posts Inverse Reinforcement Learning Book Reviews

60 Don't design agents which exploit adversarial inputs

TurnTrout

1mo

61

32 People care about each other even though they have imperfect motivational pointers?

TurnTrout

1mo

25

15 Stable Pointers to Value: An Agent Embedded in Its Own Utility Function

abramdemski

5y

9

30 What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment

xuan

3mo

15

60 Beyond Kolmogorov and Shannon

Alexander Gietelink Oldenziel

1mo

14

104 The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

2y

43

56 Humans can be assigned any values whatsoever…

Stuart_Armstrong

4y

26

10 AIs should learn human preferences, not biases

Stuart_Armstrong

8mo

1

17 RFC: Philosophical Conservatism in AI Alignment Research

Gordon Seidoh Worley

4y

13

34 Human-AI Interaction

Rohin Shah

3y

10

1 Humans can be assigned any values whatsoever...

Stuart_Armstrong

5y

0

0 Kolmogorov complexity makes reward learning worse

Stuart_Armstrong

5y

0

49 What is ambitious value learning?

Rohin Shah

4y

28

67 Parsing Chris Mingard on Neural Networks

Alex Flint

1y

27

0 Inverse reinforcement learning on self, pre-ontology-change

Stuart_Armstrong

7y

0

14 Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences

orthonormal

6y

0

1 (C)IRL is not solely a learning process

Stuart_Armstrong

6y

0

3 CIRL Wireheading

tom4everitt

5y

0

15 Delegative Inverse Reinforcement Learning

Vanessa Kosoy

5y

0

27 Agents That Learn From Human Behavior Can't Learn Human Values That Humans Haven't Learned Yet

steven0461

4y

11

63 Thoughts on "Human-Compatible"

TurnTrout

3y

35

70 [Book Review] "The Alignment Problem" by Brian Christian

lsusr

1y

16

33 Model Mis-specification and Inverse Reinforcement Learning

Owain_Evans

4y

3

42 Human-AI Collaboration

Rohin Shah

3y

7

41 Learning biases and rewards simultaneously

Rohin Shah

3y

3