Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

73 posts Reinforcement Learning Inverse Reinforcement Learning Wireheading Reward Functions Road To AI Safety Excellence

28 posts AI Capabilities Definitions Stag Hunt Goals Prompt Engineering PaLM EfficientZero

10 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

252 Reward is not the optimization target

TurnTrout

4mo

97

40 A Short Dialogue on the Meaning of Reward Functions

Leon Lang

1mo

0

21 generalized wireheading

carado

1mo

7

76 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

8 AGIs may value intrinsic rewards more than extrinsic ones

catubc

1mo

6

15 An investigation into when agents may be incentivized to manipulate our beliefs.

Felix Hofstätter

3mo

0

16 A Survey of Foundational Methods in Inverse Reinforcement Learning

adamk

3mo

0

82 Jitters No Evidence of Stupidity in RL

1a3orn

1y

18

25 Is CIRL a promising agenda?

Chris_Leong

6mo

12

25 Reward model hacking as a challenge for reward learning

Erik Jenner

8mo

1

59 My take on Michael Littman on "The HCI of HAI"

Alex Flint

1y

4

16 RLHF

Ansh Radhakrishnan

7mo

5

77 Book Review: Human Compatible

Scott Alexander

2y

6

81 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

74 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

276 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

273 EfficientZero: How It Works

1a3orn

1y

42

6 Can GPT-3 Write Contra Dances?

jefftk

16d

0

6 Mastering Stratego (Deepmind)

svemirski

18d

0

134 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

23 What's the Most Impressive Thing That GPT-4 Could Plausibly Do?

bayesed

3mo

24

34 Remaking EfficientZero (as best I can)

Hoagy

5mo

9

40 The Problem With The Current State of AGI Definitions

Yitz

6mo

22

58 Competitive programming with AlphaCode

Algon

10mo

37

51 Misc. questions about EfficientZero

Daniel Kokotajlo

1y

17

11 How might we make better use of AI capabilities research for alignment purposes?

ghostwheel

3mo

4

7 Uncompetitive programming with GPT-3

Bezzi

10mo

8