Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

125 posts World Modeling Impact Regularization Anthropics Exercises / Problem-Sets Fixed Point Theorems AIXI Updateless Decision Theory Sleeping Beauty Paradox Cognitive Science Extraterrestrial Life Economics Grabby Aliens

47 posts Human Values Shard Theory Complexity of Value Gradient Hacking Heuristics & Biases Evolution Value Drift Information Theory Gradient Descent Modularity Ontology Biology

573 Where I agree and disagree with Eliezer

paulfchristiano

6mo

205

239 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra

5mo

89

159 My research methodology

paulfchristiano

1y

36

159 Testing The Natural Abstraction Hypothesis: Project Intro

johnswentworth

1y

34

145 Fixing The Good Regulator Theorem

johnswentworth

1y

25

100 Selection Theorems: A Program For Understanding Agents

johnswentworth

1y

23

98 There is essentially one best-validated theory of cognition.

abramdemski

1y

34

97 Frequent arguments about alignment

John Schulman

1y

16

84 Less Threat-Dependent Bargaining Solutions?? (3/2)

Diffractor

4mo

7

83 The Goldbach conjecture is probably correct; so was Fermat's last theorem

Stuart_Armstrong

2y

27

83 Reframing Impact

TurnTrout

3y

15

77 Deducing Impact

TurnTrout

3y

26

77 Abstractions as Redundant Information

johnswentworth

10mo

7

75 «Boundaries», Part 3a: Defining boundaries as directed Markov blankets

Andrew_Critch

1mo

13

170 Utility Maximization = Description Length Minimization

johnswentworth

1y

40

159 Humans provide an untapped wealth of evidence about alignment

TurnTrout

5mo

92

155 Evolution of Modularity

johnswentworth

3y

12

155 The shard theory of human values

Quintin Pope

3mo

57

140 Shard Theory: An Overview

David Udell

4mo

34

117 A broad basin of attraction around human values?

Wei_Dai

8mo

16

99 Two Neglected Problems in Human-AI Safety

Wei_Dai

4y

24

95 Contra shard theory, in the context of the diamond maximizer problem

So8res

2mo

16

92 Human values & biases are inaccessible to the genome

TurnTrout

5mo

51

91 The Telephone Theorem: Information At A Distance Is Mediated By Deterministic Constraints

johnswentworth

1y

21

88 The two-layer model of human values, and problems with synthesizing preferences

Kaj_Sotala

2y

16

86 But exactly how complex and fragile?

KatjaGrace

3y

32

78 Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceC

1d

9

78 Three AI Safety Related Ideas

Wei_Dai

4y

38