Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

46 posts Research Agendas Game Theory Center on Long-Term Risk (CLR) Risks of Astronomical Suffering (S-risks) Mechanism Design Suffering Fairness Blackmail / Extortion Group Rationality Terminology / Jargon (meta) Reading Group Mind Crime

65 posts Iterated Amplification Debate (AI safety technique) Factored Cognition Humans Consulting HCH Ought Adversarial Collaboration Delegation

35 My AGI safety research—2022 review, ’23 plans

Steven Byrnes

6d

6

54 «Boundaries», Part 3b: Alignment problems in terms of boundaries

Andrew_Critch

6d

2

231 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

155 «Boundaries», Part 1: a key missing concept from utility theory

Andrew_Critch

4mo

26

98 Threat-Resistant Bargaining Megapost: Introducing the ROSE Value

Diffractor

2mo

11

204 Unifying Bargaining Notions (1/2)

Diffractor

4mo

38

155 Some conceptual alignment research projects

Richard_Ngo

3mo

14

120 Unifying Bargaining Notions (2/2)

Diffractor

4mo

11

10 Distilled Representations Research Agenda

Hoagy

2mo

2

1 Announcing: Mechanism Design for AI Safety - Reading Group

Rubi J. Hudson

4mo

3

26 Sections 5 & 6: Contemporary Architectures, Humans in the Loop

JesseClifton

3y

4

38 Resources for AI Alignment Cartography

Gyrodiot

2y

8

44 Technical AGI safety research outside AI

Richard_Ngo

3y

3

41 Formal Open Problem in Decision Theory

Scott Garrabrant

4y

11

47 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

52 Notes on OpenAI’s alignment plan

Alex Flint

12d

5

85 Rant on Problem Factorization for Alignment

johnswentworth

4mo

48

64 Relaxed adversarial training for inner alignment

evhub

3y

28

118 Paul's research agenda FAQ

zhukeepa

4y

73

35 A Small Negative Result on Debate

Sam Bowman

8mo

11

122 Supervise Process, not Outcomes

stuhlmueller

8mo

8

21 Ought will host a factored cognition “Lab Meeting”

jungofthewon

3mo

1

28 Meta-execution

paulfchristiano

4y

1

46 HCH is not just Mechanical Turk

William_S

3y

6

32 Can HCH epistemically dominate Ramanujan?

zhukeepa

3y

4

28 AI Safety Debate and Its Applications

VojtaKovarik

3y

5

6 Predicting HCH using expert advice

jessicata

6y

0

1 HCH as a measure of manipulation

orthonormal

5y

0