Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

67 posts Value Learning Inverse Reinforcement Learning The Pointers Problem Meta-Philosophy Metaethics Kolmogorov Complexity Philosophy Book Reviews Perceptual Control Theory

59 posts Community Agent Foundations Machine Intelligence Research Institute (MIRI) Cognitive Reduction Center for Human-Compatible AI (CHAI) Regulation and AI Risk Grants & Fundraising Opportunities Future of Humanity Institute (FHI) Population Ethics Utilitarianism Moral Uncertainty The SF Bay Area

60 Don't design agents which exploit adversarial inputs

TurnTrout

1mo

61

32 People care about each other even though they have imperfect motivational pointers?

TurnTrout

1mo

25

15 Stable Pointers to Value: An Agent Embedded in Its Own Utility Function

abramdemski

5y

9

30 What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment

xuan

3mo

15

60 Beyond Kolmogorov and Shannon

Alexander Gietelink Oldenziel

1mo

14

104 The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

2y

43

56 Humans can be assigned any values whatsoever…

Stuart_Armstrong

4y

26

10 AIs should learn human preferences, not biases

Stuart_Armstrong

8mo

1

17 RFC: Philosophical Conservatism in AI Alignment Research

Gordon Seidoh Worley

4y

13

34 Human-AI Interaction

Rohin Shah

3y

10

0 Inverse reinforcement learning on self, pre-ontology-change

Stuart_Armstrong

7y

0

14 Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences

orthonormal

6y

0

1 (C)IRL is not solely a learning process

Stuart_Armstrong

6y

0

3 CIRL Wireheading

tom4everitt

5y

0

69 Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility

Akash

28d

20

53 Clarifying the Agent-Like Structure Problem

johnswentworth

2mo

14

23 Bridging Expected Utility Maximization and Optimization

Whispermute

4mo

5

65 Prize and fast track to alignment research at ALTER

Vanessa Kosoy

3mo

4

16 The Slippery Slope from DALLE-2 to Deepfake Anarchy

scasper

1mo

9

69 Announcing the Introduction to ML Safety course

Dan H

4mo

6

62 Encultured AI Pre-planning, Part 1: Enabling New Benchmarks

Andrew_Critch

4mo

2

74 AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022

Sam Bowman

3mo

2

247 Why Agent Foundations? An Overly Abstract Explanation

johnswentworth

9mo

54

13 AI Safety Discussion Days

Linda Linsefors

2y

1

15 Looking for an alignment tutor

JanBrauner

3d

2

60 Jobs: Help scale up LM alignment research at NYU

Sam Bowman

7mo

1

21 My current take on the Paul-MIRI disagreement on alignability of messy AI

jessicata

5y

0

27 On motivations for MIRI's highly reliable agent design research

jessicata

5y

1