Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

101 posts Reinforcement Learning AI Capabilities Inverse Reinforcement Learning Wireheading Definitions Reward Functions Stag Hunt Road To AI Safety Excellence Goals Prompt Engineering EfficientZero PaLM

63 posts Value Learning The Pointers Problem

10 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

81 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

74 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

252 Reward is not the optimization target

TurnTrout

4mo

97

40 A Short Dialogue on the Meaning of Reward Functions

Leon Lang

1mo

0

276 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

21 generalized wireheading

carado

1mo

7

273 EfficientZero: How It Works

1a3orn

1y

42

76 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

6 Can GPT-3 Write Contra Dances?

jefftk

16d

0

6 Mastering Stratego (Deepmind)

svemirski

18d

0

8 AGIs may value intrinsic rewards more than extrinsic ones

catubc

1mo

6

134 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

23 What's the Most Impressive Thing That GPT-4 Could Plausibly Do?

bayesed

3mo

24

22 Character alignment

p.b.

3mo

0

42 Different perspectives on concept extrapolation

Stuart_Armstrong

8mo

7

104 The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

2y

43

16 Value extrapolation vs Wireheading

Stuart_Armstrong

6mo

1

26 How an alien theory of mind might be unlearnable

Stuart_Armstrong

11mo

35

19 An Open Philanthropy grant proposal: Causal representation learning of human preferences

PabloAMC

11mo

6

14 Value extrapolation, concept extrapolation, model splintering

Stuart_Armstrong

9mo

1

9 The Pointers Problem - Distilled

NinaR

6mo

0

17 Morally underdefined situations can be deadly

Stuart_Armstrong

1y

8

10 AIs should learn human preferences, not biases

Stuart_Armstrong

8mo

1

69 The E-Coli Test for AI Alignment

johnswentworth

4y

24

68 Preface to the sequence on value learning

Rohin Shah

4y

6

65 Why we need a *theory* of human values

Stuart_Armstrong

4y

15

64 Clarifying "AI Alignment"

paulfchristiano

4y

82