Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

73 posts Reinforcement Learning Inverse Reinforcement Learning Wireheading Reward Functions Road To AI Safety Excellence

28 posts AI Capabilities Definitions Stag Hunt Goals Prompt Engineering PaLM EfficientZero

218 Reward is not the optimization target

TurnTrout

4mo

97

13 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

77 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

16 generalized wireheading

carado

1mo

7

5 AGIs may value intrinsic rewards more than extrinsic ones

catubc

1mo

6

24 Is CIRL a promising agenda?

Chris_Leong

6mo

12

-10 Reward IS the Optimization Target

Carn

2mo

3

-4 Reinforcement Learner Wireheading

Nate Showell

5mo

2

58 The Stamp Collector

So8res

7y

14

2 What messy problems do you see Deep Reinforcement Learning applicable to?

Riccardo Volpato

2y

0

39 You cannot be mistaken about (not) wanting to wirehead

Kaj_Sotala

12y

79

11 Reward function learning: the value function

Stuart_Armstrong

4y

0

0 Inverse reinforcement learning on self, pre-ontology-change

Stuart_Armstrong

7y

0

6 Some work on connecting UDT and Reinforcement Learning

IAFF-User-111

7y

0

39 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

71 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

11 What's the Most Impressive Thing That GPT-4 Could Plausibly Do?

bayesed

3mo

24

23 Remaking EfficientZero (as best I can)

Hoagy

5mo

9

2 How might we make better use of AI capabilities research for alignment purposes?

ghostwheel

3mo

4

281 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

50 Do Humans Want Things?

lukeprog

11y

53

69 Misc. questions about EfficientZero

Daniel Kokotajlo

1y

17

35 The Problem With The Current State of AGI Definitions

Yitz

6mo

22

1 Define Rationality

Marshall

13y

14

19 Seeking better name for "Effective Egoism"

DataPacRat

6y

30

43 Note on Terminology: "Rationality", not "Rationalism"

Vladimir_Nesov

11y

51

29 Disambiguating "alignment" and related notions

David Scott Krueger (formerly: capybaralet)

4y

21

2 Uncompetitive programming with GPT-3

Bezzi

10mo

8