Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

180 posts Research Agendas Embedded Agency Suffering Agency Animal Welfare Risks of Astronomical Suffering (S-risks) Robust Agents Cause Prioritization Center on Long-Term Risk (CLR) 80,000 Hours Crucial Considerations Veg*nism

164 posts Value Learning Reinforcement Learning AI Capabilities Inverse Reinforcement Learning Wireheading Definitions Reward Functions The Pointers Problem Stag Hunt Road To AI Safety Excellence Goals EfficientZero

34 My AGI safety research—2022 review, ’23 plans

Steven Byrnes

6d

6

300 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

23 Should you refrain from having children because of the risk posed by artificial intelligence?

Mientras

3mo

28

190 Some conceptual alignment research projects

Richard_Ngo

3mo

14

7 EA, Veganism and Negative Animal Utilitarianism

Yair Halberstadt

3mo

12

9 Cooperators are more powerful than agents

Ivan Vendrov

2mo

7

9 Interpreting systems as solving POMDPs: a step towards a formal understanding of agency [paper link]

the gears to ascenscion

1mo

2

247 Humans are very reliable agents

alyssavance

6mo

35

45 Gradations of Agency

Daniel Kokotajlo

7mo

6

-10 A Longtermist case against Veganism

Connor Tabarrok

2mo

2

21 Distilled Representations Research Agenda

Hoagy

2mo

2

4 Some thoughts on Animals

nitinkhanna

5mo

6

19 Peter Singer's first published piece on AI

Fai

5mo

5

4 Vegetarianism and depression

Maggy

2mo

2

286 Reward is not the optimization target

TurnTrout

4mo

97

7 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

109 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

91 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

75 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

26 generalized wireheading

carado

1mo

7

35 What's the Most Impressive Thing That GPT-4 Could Plausibly Do?

bayesed

3mo

24

11 AGIs may value intrinsic rewards more than extrinsic ones

catubc

1mo

6

25 Latent Variables and Model Mis-Specification

jsteinhardt

4y

7

10 Stable Pointers to Value: An Agent Embedded in Its Own Utility Function

abramdemski

5y

9

26 Is CIRL a promising agenda?

Chris_Leong

6mo

12

45 Remaking EfficientZero (as best I can)

Hoagy

5mo

9

115 The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

2y

43

8 Reward IS the Optimization Target

Carn

2mo

3