Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

125 posts World Modeling Impact Regularization Anthropics Exercises / Problem-Sets Fixed Point Theorems AIXI Updateless Decision Theory Sleeping Beauty Paradox Cognitive Science Extraterrestrial Life Economics Grabby Aliens

47 posts Human Values Shard Theory Complexity of Value Gradient Hacking Heuristics & Biases Evolution Value Drift Information Theory Gradient Descent Modularity Ontology Biology

48 Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide

Andrew_Critch

26d

34

28 Traps of Formalization in Deconfusion

adamShimi

1y

7

75 «Boundaries», Part 3a: Defining boundaries as directed Markov blankets

Andrew_Critch

1mo

13

239 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra

5mo

89

67 Humans do acausal coordination all the time

Adam Jermyn

1mo

36

84 Less Threat-Dependent Bargaining Solutions?? (3/2)

Diffractor

4mo

7

159 Testing The Natural Abstraction Hypothesis: Project Intro

johnswentworth

1y

34

573 Where I agree and disagree with Eliezer

paulfchristiano

6mo

205

65 Attainable Utility Preservation: Empirical Results

TurnTrout

2y

8

17 Deliberation Everywhere: Simple Examples

Oliver Sourbut

5mo

0

20 Using modal fixed points to formalize logical causality

cousin_it

5y

0

11 An implementation of modal UDT

Benya_Fallenstein

7y

0

0 Corrigibility for AIXI via double indifference

Stuart_Armstrong

6y

0

13 Updatelessness and Son of X

Scott Garrabrant

6y

0

78 Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceC

1d

9

46 Positive values seem more robust and lasting than prohibitions

TurnTrout

3d

9

58 Alignment allows "nonrobust" decision-influences and doesn't require robust grading

TurnTrout

21d

27

26 Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Jacy Reese Anthis

1mo

8

155 The shard theory of human values

Quintin Pope

3mo

57

159 Humans provide an untapped wealth of evidence about alignment

TurnTrout

5mo

92

92 Human values & biases are inaccessible to the genome

TurnTrout

5mo

51

47 Understanding and avoiding value drift

TurnTrout

3mo

9

55 Gradient Hacker Design Principles From Biology

johnswentworth

3mo

13

170 Utility Maximization = Description Length Minimization

johnswentworth

1y

40

140 Shard Theory: An Overview

David Udell

4mo

34

95 Contra shard theory, in the context of the diamond maximizer problem

So8res

2mo

16

35 How are you dealing with ontology identification?

Erik Jenner

2mo

10

51 General alignment properties

TurnTrout

4mo

2