Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

20 posts Gradient Hacking Evolution Heuristics & Biases Modularity Information Theory Human Values Gradient Descent Biology Experiments Cultural knowledge Aesthetics Request Post

27 posts Complexity of Value Shard Theory Value Drift Ontology Whole Brain Emulation Motivations LessWrong Review Psychology Futurism General Alignment Properties Internal Alignment (Human) Superstimuli

196 Utility Maximization = Description Length Minimization

johnswentworth

1y

40

163 Evolution of Modularity

johnswentworth

3y

12

93 A broad basin of attraction around human values?

Wei_Dai

8mo

16

92 Human values & biases are inaccessible to the genome

TurnTrout

5mo

51

81 The Telephone Theorem: Information At A Distance Is Mediated By Deterministic Constraints

johnswentworth

1y

21

71 Conditions for mathematical equivalence of Stochastic Gradient Descent and Natural Selection

Oliver Sourbut

7mo

12

69 Ten experiments in modularity, which we'd like you to run!

TheMcDouglas

6mo

2

58 Theories of Modularity in the Biological Literature

TheMcDouglas

8mo

13

49 Gradient Hacker Design Principles From Biology

johnswentworth

3mo

13

49 Hypothesis: gradient descent prefers general circuits

Quintin Pope

10mo

26

38 Gradient descent is not just more efficient genetic algorithms

leogao

1y

14

34 The Blackwell order as a formalization of knowledge

Alex Flint

1y

10

21 Emergent modularity and safety

Richard_Ngo

1y

15

16 Musings on Cumulative Cultural Evolution and AI

calebo

3y

5

249 The shard theory of human values

Quintin Pope

3mo

57

191 Humans provide an untapped wealth of evidence about alignment

TurnTrout

5mo

92

120 Shard Theory: An Overview

David Udell

4mo

34

73 Contra shard theory, in the context of the diamond maximizer problem

So8res

2mo

16

71 Two Neglected Problems in Human-AI Safety

Wei_Dai

4y

24

63 Why we need a *theory* of human values

Stuart_Armstrong

4y

15

62 Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceC

1d

9

60 But exactly how complex and fragile?

KatjaGrace

3y

32

58 Three AI Safety Related Ideas

Wei_Dai

4y

38

52 Alignment allows "nonrobust" decision-influences and doesn't require robust grading

TurnTrout

21d

27

50 The two-layer model of human values, and problems with synthesizing preferences

Kaj_Sotala

2y

16

47 Acknowledging Human Preference Types to Support Value Learning

Nandi Sabrina Erin

4y

4

45 Test Cases for Impact Regularisation Methods

DanielFilan

3y

5

41 General alignment properties

TurnTrout

4mo

2