Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

177 posts Rationality Decision Theory Abstraction Goal-Directedness Utility Functions Finite Factored Sets Causality Literature Reviews Quantilization Mild Optimization Open Problems Filtered Evidence

172 posts World Modeling Impact Regularization Human Values Shard Theory Anthropics Complexity of Value Exercises / Problem-Sets Gradient Hacking Evolution Fixed Point Theorems Heuristics & Biases Modularity

171 Finite Factored Sets in Pictures

Magdalena Wache

9d

29

35 Take 7: You should talk about "the human's utility function" less.

Charlie Steiner

12d

22

32 Counterfactability

Scott Garrabrant

1mo

4

46 Notes on "Can you control the past"

So8res

2mo

40

62 Builder/Breaker for Deconfusion

abramdemski

2mo

9

146 why assume AGIs will optimize for fixed goals?

nostalgebraist

6mo

52

108 Principles for Alignment/Agency Projects

johnswentworth

5mo

20

69 All the posts I will never write

Alexander Gietelink Oldenziel

4mo

8

60 Finding Goals in the World Model

Jeremy Gillen

4mo

8

91 wrapper-minds are the enemy

nostalgebraist

6mo

36

175 2021 AI Alignment Literature Review and Charity Comparison

Larks

12mo

26

88 [Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA

Steven Byrnes

7mo

11

137 What's Up With Confusingly Pervasive Consequentialism?

Raemon

11mo

88

71 Open Problems in AI X-Risk [PAIS #5]

Dan H

6mo

3

62 Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceC

1d

9

38 Positive values seem more robust and lasting than prohibitions

TurnTrout

3d

9

981 Where I agree and disagree with Eliezer

paulfchristiano

6mo

205

52 Alignment allows "nonrobust" decision-influences and doesn't require robust grading

TurnTrout

21d

27

381 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra

5mo

89

249 The shard theory of human values

Quintin Pope

3mo

57

38 Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide

Andrew_Critch

26d

34

12 Working towards AI alignment is better

Johannes C. Mayer

11d

2

73 Contra shard theory, in the context of the diamond maximizer problem

So8res

2mo

16

191 Humans provide an untapped wealth of evidence about alignment

TurnTrout

5mo

92

32 Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Jacy Reese Anthis

1mo

8

120 Shard Theory: An Overview

David Udell

4mo

34

35 Humans do acausal coordination all the time

Adam Jermyn

1mo

36

37 «Boundaries», Part 3a: Defining boundaries as directed Markov blankets

Andrew_Critch

1mo

13