Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
67 posts
Value Learning
Inverse Reinforcement Learning
The Pointers Problem
Meta-Philosophy
Metaethics
Kolmogorov Complexity
Philosophy
Book Reviews
Perceptual Control Theory
59 posts
Community
Agent Foundations
Machine Intelligence Research Institute (MIRI)
Cognitive Reduction
Center for Human-Compatible AI (CHAI)
Regulation and AI Risk
Grants & Fundraising Opportunities
Future of Humanity Institute (FHI)
Population Ethics
Utilitarianism
Moral Uncertainty
The SF Bay Area
53
Don't design agents which exploit adversarial inputs
TurnTrout
1mo
61
39
People care about each other even though they have imperfect motivational pointers?
TurnTrout
1mo
25
20
Stable Pointers to Value: An Agent Embedded in Its Own Utility Function
abramdemski
5y
9
15
What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment
xuan
3mo
15
40
Beyond Kolmogorov and Shannon
Alexander Gietelink Oldenziel
1mo
14
99
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
johnswentworth
2y
43
54
Humans can be assigned any values whatsoever…
Stuart_Armstrong
4y
26
10
AIs should learn human preferences, not biases
Stuart_Armstrong
8mo
1
18
RFC: Philosophical Conservatism in AI Alignment Research
Gordon Seidoh Worley
4y
13
37
Human-AI Interaction
Rohin Shah
3y
10
0
Inverse reinforcement learning on self, pre-ontology-change
Stuart_Armstrong
7y
0
17
Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences
orthonormal
6y
0
1
(C)IRL is not solely a learning process
Stuart_Armstrong
6y
0
4
CIRL Wireheading
tom4everitt
5y
0
51
Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility
Akash
28d
20
45
Clarifying the Agent-Like Structure Problem
johnswentworth
2mo
14
23
Bridging Expected Utility Maximization and Optimization
Whispermute
4mo
5
45
Prize and fast track to alignment research at ALTER
Vanessa Kosoy
3mo
4
12
The Slippery Slope from DALLE-2 to Deepfake Anarchy
scasper
1mo
9
39
Announcing the Introduction to ML Safety course
Dan H
4mo
6
70
Encultured AI Pre-planning, Part 1: Enabling New Benchmarks
Andrew_Critch
4mo
2
57
AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022
Sam Bowman
3mo
2
197
Why Agent Foundations? An Overly Abstract Explanation
johnswentworth
9mo
54
15
AI Safety Discussion Days
Linda Linsefors
2y
1
17
Looking for an alignment tutor
JanBrauner
3d
2
62
Jobs: Help scale up LM alignment research at NYU
Sam Bowman
7mo
1
27
My current take on the Paul-MIRI disagreement on alignability of messy AI
jessicata
5y
0
30
On motivations for MIRI's highly reliable agent design research
jessicata
5y
1