Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

180 posts Research Agendas Embedded Agency Suffering Agency Animal Welfare Risks of Astronomical Suffering (S-risks) Robust Agents Cause Prioritization Center on Long-Term Risk (CLR) 80,000 Hours Crucial Considerations Veg*nism

164 posts Value Learning Reinforcement Learning AI Capabilities Inverse Reinforcement Learning Wireheading Definitions Reward Functions The Pointers Problem Stag Hunt Road To AI Safety Excellence Goals EfficientZero

34 My AGI safety research—2022 review, ’23 plans

Steven Byrnes

6d

6

258 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

17 Should you refrain from having children because of the risk posed by artificial intelligence?

Mientras

3mo

28

168 Some conceptual alignment research projects

Richard_Ngo

3mo

14

9 EA, Veganism and Negative Animal Utilitarianism

Yair Halberstadt

3mo

12

14 Cooperators are more powerful than agents

Ivan Vendrov

2mo

7

12 Interpreting systems as solving POMDPs: a step towards a formal understanding of agency [paper link]

the gears to ascenscion

1mo

2

248 Humans are very reliable agents

alyssavance

6mo

35

40 Gradations of Agency

Daniel Kokotajlo

7mo

6

-2 A Longtermist case against Veganism

Connor Tabarrok

2mo

2

15 Distilled Representations Research Agenda

Hoagy

2mo

2

2 Some thoughts on Animals

nitinkhanna

5mo

6

20 Peter Singer's first published piece on AI

Fai

5mo

5

2 Vegetarianism and depression

Maggy

2mo

2

252 Reward is not the optimization target

TurnTrout

4mo

97

10 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

74 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

81 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

76 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

21 generalized wireheading

carado

1mo

7

23 What's the Most Impressive Thing That GPT-4 Could Plausibly Do?

bayesed

3mo

24

8 AGIs may value intrinsic rewards more than extrinsic ones

catubc

1mo

6

23 Latent Variables and Model Mis-Specification

jsteinhardt

4y

7

15 Stable Pointers to Value: An Agent Embedded in Its Own Utility Function

abramdemski

5y

9

25 Is CIRL a promising agenda?

Chris_Leong

6mo

12

34 Remaking EfficientZero (as best I can)

Hoagy

5mo

9

104 The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

2y

43

-1 Reward IS the Optimization Target

Carn

2mo

3