Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
734 posts
AI
Eliciting Latent Knowledge (ELK)
Infra-Bayesianism
Counterfactuals
Logic & Mathematics
Interviews
Audio
AXRP
Redwood Research
Transcripts
Formal Proof
Domain Theory
74 posts
Embedded Agency
Reinforcement Learning
Subagents
Reward Functions
EfficientZero
Robust Agents
Wireheading
AI Capabilities
Spurious Counterfactuals
Category Theory
Memetics
Tradeoffs
503
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
3mo
83
409
Discussion with Eliezer Yudkowsky on AGI interventions
Rob Bensinger
1y
257
265
The Plan
johnswentworth
1y
77
265
Ngo and Yudkowsky on alignment difficulty
Eliezer Yudkowsky
1y
143
265
An overview of 11 proposals for building safe advanced AI
evhub
2y
36
263
DeepMind: Generally capable agents emerge from open-ended play
Daniel Kokotajlo
1y
53
257
larger language models may disappoint you [or, an eternally unfinished draft]
nostalgebraist
1y
29
253
Embedded Agents
abramdemski
4y
41
251
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
248
Visible Thoughts Project and Bounty Announcement
So8res
1y
104
247
Optimality is the tiger, and agents are its teeth
Veedrac
8mo
31
213
Safetywashing
Adam Scholl
5mo
17
206
Attempted Gears Analysis of AGI Intervention Discussion With Eliezer
Zvi
1y
48
205
ARC's first technical report: Eliciting Latent Knowledge
paulfchristiano
1y
88
334
EfficientZero: How It Works
1a3orn
1y
42
271
Reward is not the optimization target
TurnTrout
4mo
97
237
Humans are very reliable agents
alyssavance
6mo
35
191
Embedded Agency (full-text version)
Scott Garrabrant
4y
15
188
Why Subagents?
johnswentworth
3y
42
135
Introduction to Cartesian Frames
Scott Garrabrant
2y
29
124
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
gwern
1y
52
120
We have achieved Noob Gains in AI
phdead
7mo
21
111
Robust Delegation
abramdemski
4y
10
107
Subsystem Alignment
abramdemski
4y
12
107
Reward Is Not Enough
Steven Byrnes
1y
18
107
Evaluations project @ ARC is hiring a researcher and a webdev/engineer
Beth Barnes
3mo
7
104
Will we run out of ML data? Evidence from projecting dataset size trends
Pablo Villalobos
1mo
12
99
Embedded World-Models
abramdemski
4y
16