Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

180 posts Research Agendas Embedded Agency Suffering Agency Animal Welfare Risks of Astronomical Suffering (S-risks) Robust Agents Cause Prioritization Center on Long-Term Risk (CLR) 80,000 Hours Crucial Considerations Veg*nism

164 posts Value Learning Reinforcement Learning AI Capabilities Inverse Reinforcement Learning Wireheading Definitions Reward Functions The Pointers Problem Stag Hunt Road To AI Safety Excellence Goals EfficientZero

34 My AGI safety research—2022 review, ’23 plans

Steven Byrnes

6d

6

124 New book on s-risks

Tobias_Baumann

1mo

1

300 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

190 Some conceptual alignment research projects

Richard_Ngo

3mo

14

15 Riffing on the agent type

Quinn

12d

0

247 Humans are very reliable agents

alyssavance

6mo

35

26 LLMs may capture key components of human agency

catubc

1mo

0

11 Sets of objectives for a multi-objective RL agent to optimize

Ben Smith

27d

0

21 Distilled Representations Research Agenda

Hoagy

2mo

2

69 Eliciting Latent Knowledge (ELK) - Distillation/Summary

Marius Hobbhahn

6mo

2

8 The two conceptions of Active Inference: an intelligence architecture and a theory of agency

Roman Leventov

1mo

0

23 Should you refrain from having children because of the risk posed by artificial intelligence?

Mientras

3mo

28

9 Interpreting systems as solving POMDPs: a step towards a formal understanding of agency [paper link]

the gears to ascenscion

1mo

2

45 Gradations of Agency

Daniel Kokotajlo

7mo

6

7 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

91 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

109 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

286 Reward is not the optimization target

TurnTrout

4mo

97

45 A Short Dialogue on the Meaning of Reward Functions

Leon Lang

1mo

0

271 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

26 generalized wireheading

carado

1mo

7

352 EfficientZero: How It Works

1a3orn

1y

42

75 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

7 Mastering Stratego (Deepmind)

svemirski

18d

0

11 AGIs may value intrinsic rewards more than extrinsic ones

catubc

1mo

6

35 What's the Most Impressive Thing That GPT-4 Could Plausibly Do?

bayesed

3mo

24

23 Character alignment

p.b.

3mo

0

129 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52