Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

177 posts Rationality Decision Theory Abstraction Goal-Directedness Utility Functions Finite Factored Sets Causality Literature Reviews Quantilization Mild Optimization Open Problems Filtered Evidence

172 posts World Modeling Impact Regularization Human Values Shard Theory Anthropics Complexity of Value Exercises / Problem-Sets Gradient Hacking Evolution Fixed Point Theorems Heuristics & Biases Modularity

201 What's Up With Confusingly Pervasive Consequentialism?

Raemon

11mo

88

154 Realism about rationality

Richard_Ngo

4y

145

153 2021 AI Alignment Literature Review and Charity Comparison

Larks

12mo

26

146 Saving Time

Scott Garrabrant

1y

19

137 2020 AI Alignment Literature Review and Charity Comparison

Larks

1y

14

134 Can you control the past?

Joe Carlsmith

1y

93

133 Finite Factored Sets

Scott Garrabrant

1y

94

126 An Orthodox Case Against Utility Functions

abramdemski

2y

53

125 Finite Factored Sets in Pictures

Magdalena Wache

9d

29

122 Principles for Alignment/Agency Projects

johnswentworth

5mo

20

106 Thinking About Filtered Evidence Is (Very!) Hard

abramdemski

2y

29

105 Problem relaxation as a tactic

TurnTrout

2y

8

105 Coherence arguments do not entail goal-directed behavior

Rohin Shah

4y

69

105 Utility ≠ Reward

vlad_m

3y

25

573 Where I agree and disagree with Eliezer

paulfchristiano

6mo

205

239 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra

5mo

89

170 Utility Maximization = Description Length Minimization

johnswentworth

1y

40

159 Humans provide an untapped wealth of evidence about alignment

TurnTrout

5mo

92

159 My research methodology

paulfchristiano

1y

36

159 Testing The Natural Abstraction Hypothesis: Project Intro

johnswentworth

1y

34

155 Evolution of Modularity

johnswentworth

3y

12

155 The shard theory of human values

Quintin Pope

3mo

57

145 Fixing The Good Regulator Theorem

johnswentworth

1y

25

140 Shard Theory: An Overview

David Udell

4mo

34

117 A broad basin of attraction around human values?

Wei_Dai

8mo

16

100 Selection Theorems: A Program For Understanding Agents

johnswentworth

1y

23

99 Two Neglected Problems in Human-AI Safety

Wei_Dai

4y

24

98 There is essentially one best-validated theory of cognition.

abramdemski

1y

34