Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
41 posts
Reinforcement Learning
AI Capabilities
Wireheading
Reward Functions
EfficientZero
Tradeoffs
33 posts
Embedded Agency
Subagents
Robust Agents
Category Theory
Spurious Counterfactuals
Memetics
Autonomous Vehicles
233
Reward is not the optimization target
TurnTrout
4mo
97
212
EfficientZero: How It Works
1a3orn
1y
42
144
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
gwern
1y
52
108
We have achieved Noob Gains in AI
phdead
7mo
21
98
The alignment problem in different capability regimes
Buck
1y
12
87
Jitters No Evidence of Stupidity in RL
1a3orn
1y
18
83
Scaling Laws for Reward Model Overoptimization
leogao
2mo
11
81
Evaluations project @ ARC is hiring a researcher and a webdev/engineer
Beth Barnes
3mo
7
80
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
77
Towards deconfusing wireheading and reward maximization
leogao
3mo
7
77
OpenAI Solves (Some) Formal Math Olympiad Problems
Michaƫl Trazzi
10mo
26
70
Misc. questions about EfficientZero
Daniel Kokotajlo
1y
17
66
My take on Michael Littman on "The HCI of HAI"
Alex Flint
1y
4
53
Draft papers for REALab and Decoupled Approval on tampering
Jonathan Uesato
2y
2
259
Humans are very reliable agents
alyssavance
6mo
35
155
Introduction to Cartesian Frames
Scott Garrabrant
2y
29
134
Why Subagents?
johnswentworth
3y
42
109
Robust Delegation
abramdemski
4y
10
103
Reward Is Not Enough
Steven Byrnes
1y
18
95
Embedded Agency (full-text version)
Scott Garrabrant
4y
15
93
Updates and additions to "Embedded Agency"
Rob Bensinger
2y
1
93
Subsystem Alignment
abramdemski
4y
12
93
Humans Are Embedded Agents Too
johnswentworth
2y
19
80
Embedded Curiosities
Scott Garrabrant
4y
1
75
Embedded World-Models
abramdemski
4y
16
70
Additive Operations on Cartesian Frames
Scott Garrabrant
2y
6
62
Functors and Coarse Worlds
Scott Garrabrant
2y
4
62
(A -> B) -> A
Scott Garrabrant
4y
11