Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

177 posts Rationality Decision Theory Abstraction Goal-Directedness Utility Functions Finite Factored Sets Causality Literature Reviews Quantilization Mild Optimization Open Problems Filtered Evidence

172 posts World Modeling Impact Regularization Human Values Shard Theory Anthropics Complexity of Value Exercises / Problem-Sets Gradient Hacking Evolution Fixed Point Theorems Heuristics & Biases Modularity

180 Realism about rationality

Richard_Ngo

4y

145

169 What's Up With Confusingly Pervasive Consequentialism?

Raemon

11mo

88

164 2021 AI Alignment Literature Review and Charity Comparison

Larks

12mo

26

148 Finite Factored Sets in Pictures

Magdalena Wache

9d

29

147 Can you control the past?

Joe Carlsmith

1y

93

137 Finite Factored Sets

Scott Garrabrant

1y

94

137 2020 AI Alignment Literature Review and Charity Comparison

Larks

1y

14

130 Saving Time

Scott Garrabrant

1y

19

128 An Orthodox Case Against Utility Functions

abramdemski

2y

53

119 why assume AGIs will optimize for fixed goals?

nostalgebraist

6mo

52

115 Principles for Alignment/Agency Projects

johnswentworth

5mo

20

114 Decision Theory

abramdemski

4y

46

113 Problem relaxation as a tactic

TurnTrout

2y

8

102 Utility ≠ Reward

vlad_m

3y

25

777 Where I agree and disagree with Eliezer

paulfchristiano

6mo

205

310 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra

5mo

89

202 The shard theory of human values

Quintin Pope

3mo

57

183 Utility Maximization = Description Length Minimization

johnswentworth

1y

40

175 Humans provide an untapped wealth of evidence about alignment

TurnTrout

5mo

92

159 Evolution of Modularity

johnswentworth

3y

12

148 My research methodology

paulfchristiano

1y

36

145 Testing The Natural Abstraction Hypothesis: Project Intro

johnswentworth

1y

34

130 Shard Theory: An Overview

David Udell

4mo

34

123 Fixing The Good Regulator Theorem

johnswentworth

1y

25

105 A broad basin of attraction around human values?

Wei_Dai

8mo

16

103 Selection Theorems: A Program For Understanding Agents

johnswentworth

1y

23

100 Towards a New Impact Measure

TurnTrout

4y

159

95 Frequent arguments about alignment

John Schulman

1y

16