Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

101 posts Reinforcement Learning AI Capabilities Inverse Reinforcement Learning Wireheading Definitions Reward Functions Stag Hunt Road To AI Safety Excellence Goals Prompt Engineering EfficientZero PaLM

63 posts Value Learning The Pointers Problem

13 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

71 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

218 Reward is not the optimization target

TurnTrout

4mo

97

35 A Short Dialogue on the Meaning of Reward Functions

Leon Lang

1mo

0

39 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

281 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

10 Can GPT-3 Write Contra Dances?

jefftk

16d

0

16 generalized wireheading

carado

1mo

7

77 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

194 EfficientZero: How It Works

1a3orn

1y

42

5 Mastering Stratego (Deepmind)

svemirski

18d

0

139 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

5 AGIs may value intrinsic rewards more than extrinsic ones

catubc

1mo

6

35 The Problem With The Current State of AGI Definitions

Yitz

6mo

22

21 Character alignment

p.b.

3mo

0

48 Different perspectives on concept extrapolation

Stuart_Armstrong

8mo

7

23 Value extrapolation vs Wireheading

Stuart_Armstrong

6mo

1

93 The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

2y

43

29 How an alien theory of mind might be unlearnable

Stuart_Armstrong

11mo

35

20 Value extrapolation, concept extrapolation, model splintering

Stuart_Armstrong

9mo

1

20 Morally underdefined situations can be deadly

Stuart_Armstrong

1y

8

13 An Open Philanthropy grant proposal: Causal representation learning of human preferences

PabloAMC

11mo

6

9 AIs should learn human preferences, not biases

Stuart_Armstrong

8mo

1

68 Clarifying "AI Alignment"

paulfchristiano

4y

82

7 The Pointers Problem - Distilled

NinaR

6mo

0

64 Why we need a *theory* of human values

Stuart_Armstrong

4y

15

42 Since figuring out human values is hard, what about, say, monkey values?

shminux

2y

13

58 The E-Coli Test for AI Alignment

johnswentworth

4y

24