Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

56 posts Value Learning The Pointers Problem Meta-Philosophy Metaethics Kolmogorov Complexity Philosophy Perceptual Control Theory

11 posts Inverse Reinforcement Learning Book Reviews

99 The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

2y

43

76 Some Thoughts on Metaphilosophy

Wei_Dai

3y

27

71 Clarifying "AI Alignment"

paulfchristiano

4y

82

66 Parsing Chris Mingard on Neural Networks

Alex Flint

1y

27

62 Preface to the sequence on value learning

Rohin Shah

4y

6

56 Conclusion to the sequence on value learning

Rohin Shah

3y

20

54 What is ambitious value learning?

Rohin Shah

4y

28

54 Humans can be assigned any values whatsoever…

Stuart_Armstrong

4y

26

54 Policy Alignment

abramdemski

4y

25

54 Normativity

abramdemski

2y

11

53 Intuitions about goal-directed behavior

Rohin Shah

4y

15

53 Don't design agents which exploit adversarial inputs

TurnTrout

1mo

61

52 The easy goal inference problem is still hard

paulfchristiano

4y

19

50 Different perspectives on concept extrapolation

Stuart_Armstrong

8mo

7

69 [Book Review] "The Alignment Problem" by Brian Christian

lsusr

1y

16

59 Thoughts on "Human-Compatible"

TurnTrout

3y

35

53 Human-AI Collaboration

Rohin Shah

3y

7

48 Learning biases and rewards simultaneously

Rohin Shah

3y

3

39 Model Mis-specification and Inverse Reinforcement Learning

Owain_Evans

4y

3

34 Agents That Learn From Human Behavior Can't Learn Human Values That Humans Haven't Learned Yet

steven0461

4y

11

18 Delegative Inverse Reinforcement Learning

Vanessa Kosoy

5y

0

17 Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences

orthonormal

6y

0

4 CIRL Wireheading

tom4everitt

5y

0

1 (C)IRL is not solely a learning process

Stuart_Armstrong

6y

0

0 Inverse reinforcement learning on self, pre-ontology-change

Stuart_Armstrong

7y

0