Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

20 posts Gradient Hacking Evolution Heuristics & Biases Modularity Information Theory Human Values Gradient Descent Biology Experiments Cultural knowledge Aesthetics Request Post

27 posts Complexity of Value Shard Theory Value Drift Ontology Whole Brain Emulation Motivations LessWrong Review Psychology Futurism General Alignment Properties Internal Alignment (Human) Superstimuli

92 Human values & biases are inaccessible to the genome

TurnTrout

5mo

51

52 Gradient Hacker Design Principles From Biology

johnswentworth

3mo

13

183 Utility Maximization = Description Length Minimization

johnswentworth

1y

40

86 The Telephone Theorem: Information At A Distance Is Mediated By Deterministic Constraints

johnswentworth

1y

21

22 Gradient hacking: definitions and examples

Richard_Ngo

5mo

1

54 Conditions for mathematical equivalence of Stochastic Gradient Descent and Natural Selection

Oliver Sourbut

7mo

12

1 Speculations on information under logical uncertainty

TsviBT

6y

0

59 Ten experiments in modularity, which we'd like you to run!

TheMcDouglas

6mo

2

18 How to Throw Away Information

johnswentworth

3y

5

159 Evolution of Modularity

johnswentworth

3y

12

40 Hypothesis: gradient descent prefers general circuits

Quintin Pope

10mo

26

47 Theories of Modularity in the Biological Literature

TheMcDouglas

8mo

13

19 Musings on Cumulative Cultural Evolution and AI

calebo

3y

5

19 Preference synthesis illustrated: Star Wars

Stuart_Armstrong

2y

8

70 Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceC

1d

9

42 Positive values seem more robust and lasting than prohibitions

TurnTrout

3d

9

55 Alignment allows "nonrobust" decision-influences and doesn't require robust grading

TurnTrout

21d

27

29 Unpacking "Shard Theory" as Hunch, Question, Theory, and Insight

Jacy Reese Anthis

1mo

8

202 The shard theory of human values

Quintin Pope

3mo

57

175 Humans provide an untapped wealth of evidence about alignment

TurnTrout

5mo

92

40 Understanding and avoiding value drift

TurnTrout

3mo

9

130 Shard Theory: An Overview

David Udell

4mo

34

84 Contra shard theory, in the context of the diamond maximizer problem

So8res

2mo

16

33 How are you dealing with ontology identification?

Erik Jenner

2mo

10

46 General alignment properties

TurnTrout

4mo

2

69 The two-layer model of human values, and problems with synthesizing preferences

Kaj_Sotala

2y

16

2 Chatbots or set answers, not WBEs

Stuart_Armstrong

7y

0

12 A sketch of a value-learning sovereign

jessicata

7y

0