Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

344 posts Research Agendas Value Learning Reinforcement Learning Embedded Agency Suffering AI Capabilities Agency Animal Welfare Inverse Reinforcement Learning Risks of Astronomical Suffering (S-risks) Wireheading Robust Agents

14230 posts Decision Theory Utility Functions Counterfactuals Goal-Directedness Nutrition Newcomb's Problem VNM Theorem Updateless Decision Theory Timeless Decision Theory Literature Reviews Functional Decision Theory Counterfactual Mugging

352 EfficientZero: How It Works

1a3orn

1y

42

300 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

286 Reward is not the optimization target

TurnTrout

4mo

97

271 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

267 Embedded Agents

abramdemski

4y

41

247 Humans are very reliable agents

alyssavance

6mo

35

202 Embedded Agency (full-text version)

Scott Garrabrant

4y

15

190 Some conceptual alignment research projects

Richard_Ngo

3mo

14

176 Are wireheads happy?

Scott Alexander

12y

107

158 Being a Robust Agent

Raemon

4y

32

143 Introduction to Cartesian Frames

Scott Garrabrant

2y

29

135 Demand offsetting

paulfchristiano

1y

38

129 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

124 New book on s-risks

Tobias_Baumann

1mo

1

170 Can you control the past?

Joe Carlsmith

1y

93

161 Coherent decisions imply consistent utilities

Eliezer Yudkowsky

3y

81

155 why assume AGIs will optimize for fixed goals?

nostalgebraist

6mo

52

146 Newcomb's Problem and Regret of Rationality

Eliezer Yudkowsky

14y

614

146 2020 AI Alignment Literature Review and Charity Comparison

Larks

1y

14

146 Decision theory does not imply that we get to have nice things

So8res

2mo

53

141 Decision Theory

abramdemski

4y

46

137 An Orthodox Case Against Utility Functions

abramdemski

2y

53

137 Impossibility results for unbounded utilities

paulfchristiano

10mo

104

120 Saving Time

Scott Garrabrant

1y

19

117 How I Lost 100 Pounds Using TDT

Zvi

11y

244

108 Decision Theory FAQ

lukeprog

9y

484

105 Utility ≠ Reward

vlad_m

3y

25

102 Coherence arguments do not entail goal-directed behavior

Rohin Shah

4y

69