Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

180 posts Research Agendas Embedded Agency Suffering Agency Animal Welfare Risks of Astronomical Suffering (S-risks) Robust Agents Cause Prioritization Center on Long-Term Risk (CLR) 80,000 Hours Crucial Considerations Veg*nism

164 posts Value Learning Reinforcement Learning AI Capabilities Inverse Reinforcement Learning Wireheading Definitions Reward Functions The Pointers Problem Stag Hunt Road To AI Safety Excellence Goals EfficientZero

34 My AGI safety research—2022 review, ’23 plans

Steven Byrnes

6d

6

17 Riffing on the agent type

Quinn

12d

0

216 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

146 Some conceptual alignment research projects

Richard_Ngo

3mo

14

249 Humans are very reliable agents

alyssavance

6mo

35

16 LLMs may capture key components of human agency

catubc

1mo

0

11 Sets of objectives for a multi-objective RL agent to optimize

Ben Smith

27d

0

15 Interpreting systems as solving POMDPs: a step towards a formal understanding of agency [paper link]

the gears to ascenscion

1mo

2

19 Cooperators are more powerful than agents

Ivan Vendrov

2mo

7

6 The two conceptions of Active Inference: an intelligence architecture and a theory of agency

Roman Leventov

1mo

0

38 Understanding Selection Theorems

adamk

6mo

3

9 Distilled Representations Research Agenda

Hoagy

2mo

2

127 Demand offsetting

paulfchristiano

1y

38

35 Gradations of Agency

Daniel Kokotajlo

7mo

6

13 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

71 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

218 Reward is not the optimization target

TurnTrout

4mo

97

35 A Short Dialogue on the Meaning of Reward Functions

Leon Lang

1mo

0

39 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

281 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

10 Can GPT-3 Write Contra Dances?

jefftk

16d

0

16 generalized wireheading

carado

1mo

7

77 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

194 EfficientZero: How It Works

1a3orn

1y

42

5 Mastering Stratego (Deepmind)

svemirski

18d

0

139 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

21 Character alignment

p.b.

3mo

0

5 AGIs may value intrinsic rewards more than extrinsic ones

catubc

1mo

6