Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

101 posts Reinforcement Learning AI Capabilities Inverse Reinforcement Learning Wireheading Definitions Reward Functions Stag Hunt Road To AI Safety Excellence Goals Prompt Engineering EfficientZero PaLM

63 posts Value Learning The Pointers Problem

252 Reward is not the optimization target

TurnTrout

4mo

97

10 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

74 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

81 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

76 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

21 generalized wireheading

carado

1mo

7

23 What's the Most Impressive Thing That GPT-4 Could Plausibly Do?

bayesed

3mo

24

8 AGIs may value intrinsic rewards more than extrinsic ones

catubc

1mo

6

25 Is CIRL a promising agenda?

Chris_Leong

6mo

12

34 Remaking EfficientZero (as best I can)

Hoagy

5mo

9

-1 Reward IS the Optimization Target

Carn

2mo

3

11 How might we make better use of AI capabilities research for alignment purposes?

ghostwheel

3mo

4

8 Reinforcement Learner Wireheading

Nate Showell

5mo

2

276 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

23 Latent Variables and Model Mis-Specification

jsteinhardt

4y

7

15 Stable Pointers to Value: An Agent Embedded in Its Own Utility Function

abramdemski

5y

9

104 The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

2y

43

38 AI Alignment Problem: “Human Values” don’t Actually Exist

avturchin

3y

29

50 The easy goal inference problem is still hard

paulfchristiano

4y

19

56 Humans can be assigned any values whatsoever…

Stuart_Armstrong

4y

26

37 Since figuring out human values is hard, what about, say, monkey values?

shminux

2y

13

10 AIs should learn human preferences, not biases

Stuart_Armstrong

8mo

1

34 Human-AI Interaction

Rohin Shah

3y

10

19 An Open Philanthropy grant proposal: Causal representation learning of human preferences

PabloAMC

11mo

6

49 What is ambitious value learning?

Rohin Shah

4y

28

13 Can few-shot learning teach AI right from wrong?

Charlie Steiner

4y

3

17 Morally underdefined situations can be deadly

Stuart_Armstrong

1y

8

25 Learning human preferences: black-box, white-box, and structured white-box access

Stuart_Armstrong

2y

9