Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

125 posts World Modeling Impact Regularization Anthropics Exercises / Problem-Sets Fixed Point Theorems AIXI Updateless Decision Theory Sleeping Beauty Paradox Cognitive Science Extraterrestrial Life Economics Grabby Aliens

47 posts Human Values Shard Theory Complexity of Value Gradient Hacking Heuristics & Biases Evolution Value Drift Information Theory Gradient Descent Modularity Ontology Biology

573 Where I agree and disagree with Eliezer

paulfchristiano

6mo

205

48 Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide

Andrew_Critch

26d

34

75 «Boundaries», Part 3a: Defining boundaries as directed Markov blankets

Andrew_Critch

1mo

13

67 Humans do acausal coordination all the time

Adam Jermyn

1mo

36

239 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra

5mo

89

84 Less Threat-Dependent Bargaining Solutions?? (3/2)

Diffractor

4mo

7

98 There is essentially one best-validated theory of cognition.

abramdemski

1y

34

77 Abstractions as Redundant Information

johnswentworth

10mo

7

159 Testing The Natural Abstraction Hypothesis: Project Intro

johnswentworth

1y

34

159 My research methodology

paulfchristiano

1y

36

100 Selection Theorems: A Program For Understanding Agents

johnswentworth

1y

23

145 Fixing The Good Regulator Theorem

johnswentworth

1y

25

36 Elementary Infra-Bayesianism

Jan

7mo

2

71 Chu are you?

Adele Lopez

1y

7

78 Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceC

1d

9

46 Positive values seem more robust and lasting than prohibitions

TurnTrout

3d

9

58 Alignment allows "nonrobust" decision-influences and doesn't require robust grading

TurnTrout

21d

27

95 Contra shard theory, in the context of the diamond maximizer problem

So8res

2mo

16

155 The shard theory of human values

Quintin Pope

3mo

57

140 Shard Theory: An Overview

David Udell

4mo

34

159 Humans provide an untapped wealth of evidence about alignment

TurnTrout

5mo

92

26 Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Jacy Reese Anthis

1mo

8

92 Human values & biases are inaccessible to the genome

TurnTrout

5mo

51

55 Gradient Hacker Design Principles From Biology

johnswentworth

3mo

13

35 How are you dealing with ontology identification?

Erik Jenner

2mo

10

47 Understanding and avoiding value drift

TurnTrout

3mo

9

117 A broad basin of attraction around human values?

Wei_Dai

8mo

16

51 General alignment properties

TurnTrout

4mo

2