Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
808 posts
AI
Embedded Agency
Eliciting Latent Knowledge (ELK)
Reinforcement Learning
Infra-Bayesianism
Counterfactuals
Logic & Mathematics
AI Capabilities
Interviews
Audio
Subagents
Wireheading
126 posts
Value Learning
Inverse Reinforcement Learning
Machine Intelligence Research Institute (MIRI)
Agent Foundations
Meta-Philosophy
Metaethics
Community
Philosophy
The Pointers Problem
Moral Uncertainty
Cognitive Reduction
Center for Human-Compatible AI (CHAI)
25
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
45
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
233
Reward is not the optimization target
TurnTrout
4mo
97
35
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
99
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
136
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
82
A shot at the diamond-alignment problem
TurnTrout
2mo
53
113
Mechanistic anomaly detection and ELK
paulfchristiano
25d
17
42
Four usages of "loss" in AI
TurnTrout
2mo
18
106
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
213
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
39
In defense of probably wrong mechanistic models
evhub
14d
10
64
Verification Is Not Easier Than Generation In General
johnswentworth
14d
23
106
Finding gliders in the game of life
paulfchristiano
19d
7
53
Don't design agents which exploit adversarial inputs
TurnTrout
1mo
61
51
Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility
Akash
28d
20
39
People care about each other even though they have imperfect motivational pointers?
TurnTrout
1mo
25
20
Stable Pointers to Value: An Agent Embedded in Its Own Utility Function
abramdemski
5y
9
45
Clarifying the Agent-Like Structure Problem
johnswentworth
2mo
14
15
What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment
xuan
3mo
15
40
Beyond Kolmogorov and Shannon
Alexander Gietelink Oldenziel
1mo
14
23
Bridging Expected Utility Maximization and Optimization
Whispermute
4mo
5
45
Prize and fast track to alignment research at ALTER
Vanessa Kosoy
3mo
4
12
The Slippery Slope from DALLE-2 to Deepfake Anarchy
scasper
1mo
9
99
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
johnswentworth
2y
43
39
Announcing the Introduction to ML Safety course
Dan H
4mo
6
70
Encultured AI Pre-planning, Part 1: Enabling New Benchmarks
Andrew_Critch
4mo
2
54
Humans can be assigned any values whatsoever…
Stuart_Armstrong
4y
26