Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
67 posts
Value Learning
Inverse Reinforcement Learning
The Pointers Problem
Meta-Philosophy
Metaethics
Kolmogorov Complexity
Philosophy
Book Reviews
Perceptual Control Theory
59 posts
Community
Agent Foundations
Machine Intelligence Research Institute (MIRI)
Cognitive Reduction
Center for Human-Compatible AI (CHAI)
Regulation and AI Risk
Grants & Fundraising Opportunities
Future of Humanity Institute (FHI)
Population Ethics
Utilitarianism
Moral Uncertainty
The SF Bay Area
67
Don't design agents which exploit adversarial inputs
TurnTrout
1mo
61
80
Beyond Kolmogorov and Shannon
Alexander Gietelink Oldenziel
1mo
14
25
People care about each other even though they have imperfect motivational pointers?
TurnTrout
1mo
25
45
What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment
xuan
3mo
15
71
[Book Review] "The Alignment Problem" by Brian Christian
lsusr
1y
16
34
Different perspectives on concept extrapolation
Stuart_Armstrong
8mo
7
109
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
johnswentworth
2y
43
68
Parsing Chris Mingard on Neural Networks
Alex Flint
1y
27
22
How an alien theory of mind might be unlearnable
Stuart_Armstrong
11mo
35
67
Thoughts on "Human-Compatible"
TurnTrout
3y
35
38
Normativity
abramdemski
2y
11
10
AIs should learn human preferences, not biases
Stuart_Armstrong
8mo
1
32
AI Alignment, Philosophical Pluralism, and the Relevance of Non-Western Philosophy
xuan
1y
21
74
Preface to the sequence on value learning
Rohin Shah
4y
6
13
Event [Berkeley]: Alignment Collaborator Speed-Meeting
AlexMennen
1d
2
13
Looking for an alignment tutor
JanBrauner
3d
2
87
Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility
Akash
28d
20
297
Why Agent Foundations? An Overly Abstract Explanation
johnswentworth
9mo
54
85
Prize and fast track to alignment research at ALTER
Vanessa Kosoy
3mo
4
35
A newcomer’s guide to the technical AI safety field
zeshen
1mo
1
91
AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022
Sam Bowman
3mo
2
61
Clarifying the Agent-Like Structure Problem
johnswentworth
2mo
14
99
Announcing the Introduction to ML Safety course
Dan H
4mo
6
20
The Slippery Slope from DALLE-2 to Deepfake Anarchy
scasper
1mo
9
63
Seeking Interns/RAs for Mechanistic Interpretability Projects
Neel Nanda
4mo
0
54
Encultured AI Pre-planning, Part 1: Enabling New Benchmarks
Andrew_Critch
4mo
2
94
Introducing the ML Safety Scholars Program
Dan H
7mo
2
24
CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]
berglund
2mo
1