Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

5 posts Shard Theory Internal Alignment (Human)

22 posts Complexity of Value Value Drift Ontology Whole Brain Emulation Motivations LessWrong Review Psychology Futurism General Alignment Properties Superstimuli

70 Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceC

1d

9

42 Positive values seem more robust and lasting than prohibitions

TurnTrout

3d

9

29 Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Jacy Reese Anthis

1mo

8

202 The shard theory of human values

Quintin Pope

3mo

57

84 Contra shard theory, in the context of the diamond maximizer problem

So8res

2mo

16

55 Alignment allows "nonrobust" decision-influences and doesn't require robust grading

TurnTrout

21d

27

175 Humans provide an untapped wealth of evidence about alignment

TurnTrout

5mo

92

40 Understanding and avoiding value drift

TurnTrout

3mo

9

130 Shard Theory: An Overview

David Udell

4mo

34

33 How are you dealing with ontology identification?

Erik Jenner

2mo

10

46 General alignment properties

TurnTrout

4mo

2

69 The two-layer model of human values, and problems with synthesizing preferences

Kaj_Sotala

2y

16

2 Chatbots or set answers, not WBEs

Stuart_Armstrong

7y

0

12 A sketch of a value-learning sovereign

jessicata

7y

0

25 Would I think for ten thousand years?

Stuart_Armstrong

3y

13

85 Two Neglected Problems in Human-AI Safety

Wei_Dai

4y

24

12 Towards deconfusing values

Gordon Seidoh Worley

2y

4

36 Broad Picture of Human Values

Thane Ruthenis

4mo

5

34 Acknowledging Human Preference Types to Support Value Learning

Nandi Sabrina Erin

4y

4