Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

180 posts Research Agendas Embedded Agency Suffering Agency Animal Welfare Risks of Astronomical Suffering (S-risks) Robust Agents Cause Prioritization Center on Long-Term Risk (CLR) 80,000 Hours Crucial Considerations Veg*nism

164 posts Value Learning Reinforcement Learning AI Capabilities Inverse Reinforcement Learning Wireheading Definitions Reward Functions The Pointers Problem Stag Hunt Road To AI Safety Excellence Goals EfficientZero

34 My AGI safety research—2022 review, ’23 plans

Steven Byrnes

6d

6

16 Riffing on the agent type

Quinn

12d

0

258 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

168 Some conceptual alignment research projects

Richard_Ngo

3mo

14

60 New book on s-risks

Tobias_Baumann

1mo

1

248 Humans are very reliable agents

alyssavance

6mo

35

21 LLMs may capture key components of human agency

catubc

1mo

0

11 Sets of objectives for a multi-objective RL agent to optimize

Ben Smith

27d

0

12 Interpreting systems as solving POMDPs: a step towards a formal understanding of agency [paper link]

the gears to ascenscion

1mo

2

15 Distilled Representations Research Agenda

Hoagy

2mo

2

14 Cooperators are more powerful than agents

Ivan Vendrov

2mo

7

7 The two conceptions of Active Inference: an intelligence architecture and a theory of agency

Roman Leventov

1mo

0

49 Eliciting Latent Knowledge (ELK) - Distillation/Summary

Marius Hobbhahn

6mo

2

40 Gradations of Agency

Daniel Kokotajlo

7mo

6

10 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

81 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

74 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

252 Reward is not the optimization target

TurnTrout

4mo

97

40 A Short Dialogue on the Meaning of Reward Functions

Leon Lang

1mo

0

276 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

21 generalized wireheading

carado

1mo

7

273 EfficientZero: How It Works

1a3orn

1y

42

76 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

6 Can GPT-3 Write Contra Dances?

jefftk

16d

0

6 Mastering Stratego (Deepmind)

svemirski

18d

0

8 AGIs may value intrinsic rewards more than extrinsic ones

catubc

1mo

6

134 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

22 Character alignment

p.b.

3mo

0