Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
808 posts
AI
Embedded Agency
Eliciting Latent Knowledge (ELK)
Reinforcement Learning
Infra-Bayesianism
Counterfactuals
Logic & Mathematics
AI Capabilities
Interviews
Audio
Subagents
Wireheading
126 posts
Value Learning
Inverse Reinforcement Learning
Machine Intelligence Research Institute (MIRI)
Agent Foundations
Meta-Philosophy
Metaethics
Community
Philosophy
The Pointers Problem
Moral Uncertainty
Cognitive Reduction
Center for Human-Compatible AI (CHAI)
259
Humans are very reliable agents
alyssavance
6mo
35
242
Visible Thoughts Project and Bounty Announcement
So8res
1y
104
241
Discussion with Eliezer Yudkowsky on AGI interventions
Rob Bensinger
1y
257
233
Reward is not the optimization target
TurnTrout
4mo
97
231
DeepMind: Generally capable agents emerge from open-ended play
Daniel Kokotajlo
1y
53
219
ARC's first technical report: Eliciting Latent Knowledge
paulfchristiano
1y
88
217
larger language models may disappoint you [or, an eternally unfinished draft]
nostalgebraist
1y
29
213
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
213
Hiring engineers and researchers to help align GPT-3
paulfchristiano
2y
14
212
EfficientZero: How It Works
1a3orn
1y
42
211
Safetywashing
Adam Scholl
5mo
17
205
The Plan
johnswentworth
1y
77
205
Ngo and Yudkowsky on alignment difficulty
Eliezer Yudkowsky
1y
143
202
Attempted Gears Analysis of AGI Intervention Discussion With Eliezer
Zvi
1y
48
197
Why Agent Foundations? An Overly Abstract Explanation
johnswentworth
9mo
54
177
2018 AI Alignment Literature Review and Charity Comparison
Larks
4y
26
146
The Rocket Alignment Problem
Eliezer Yudkowsky
4y
42
139
Full-time AGI Safety!
Steven Byrnes
1y
3
133
2019 AI Alignment Literature Review and Charity Comparison
Larks
3y
18
120
What I’ll be doing at MIRI
evhub
3y
6
116
Call for research on evaluating alignment (funding + advice available)
Beth Barnes
1y
11
105
Apply to the ML for Alignment Bootcamp (MLAB) in Berkeley [Jan 3 - Jan 22]
habryka
1y
4
99
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
johnswentworth
2y
43
82
Comparing Utilities
abramdemski
2y
31
76
Some Thoughts on Metaphilosophy
Wei_Dai
3y
27
71
AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah
Palus Astra
2y
27
71
Clarifying "AI Alignment"
paulfchristiano
4y
82
70
Encultured AI Pre-planning, Part 1: Enabling New Benchmarks
Andrew_Critch
4mo
2