Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

17 posts Complexity of Value Value Drift Whole Brain Emulation Motivations LessWrong Review Psychology Futurism Superstimuli

5 posts Ontology General Alignment Properties

130 Shard Theory: An Overview

David Udell

4mo

34

85 Two Neglected Problems in Human-AI Safety

Wei_Dai

4y

24

73 But exactly how complex and fragile?

KatjaGrace

3y

32

69 The two-layer model of human values, and problems with synthesizing preferences

Kaj_Sotala

2y

16

68 Three AI Safety Related Ideas

Wei_Dai

4y

38

65 Why we need a *theory* of human values

Stuart_Armstrong

4y

15

55 Review of 'But exactly how complex and fragile?'

TurnTrout

1y

0

55 Alignment allows "nonrobust" decision-influences and doesn't require robust grading

TurnTrout

21d

27

40 Understanding and avoiding value drift

TurnTrout

3mo

9

36 Broad Picture of Human Values

Thane Ruthenis

4mo

5

35 Can there be an indescribable hellworld?

Stuart_Armstrong

3y

19

34 Acknowledging Human Preference Types to Support Value Learning

Nandi Sabrina Erin

4y

4

25 Would I think for ten thousand years?

Stuart_Armstrong

3y

13

25 Reversible changes: consider a bucket of water

Stuart_Armstrong

3y

18

175 Humans provide an untapped wealth of evidence about alignment

TurnTrout

5mo

92

58 Test Cases for Impact Regularisation Methods

DanielFilan

3y

5

46 General alignment properties

TurnTrout

4mo

2

33 How are you dealing with ontology identification?

Erik Jenner

2mo

10

12 A sketch of a value-learning sovereign

jessicata

7y

0