Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
41 posts
Reinforcement Learning
AI Capabilities
Wireheading
Reward Functions
EfficientZero
Tradeoffs
33 posts
Embedded Agency
Subagents
Robust Agents
Category Theory
Spurious Counterfactuals
Memetics
Autonomous Vehicles
233
Reward is not the optimization target
TurnTrout
4mo
97
42
Four usages of "loss" in AI
TurnTrout
2mo
18
81
Evaluations project @ ARC is hiring a researcher and a webdev/engineer
Beth Barnes
3mo
7
80
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
83
Scaling Laws for Reward Model Overoptimization
leogao
2mo
11
38
It matters when the first sharp left turn happens
Adam Jermyn
2mo
9
77
Towards deconfusing wireheading and reward maximization
leogao
3mo
7
38
Conditioning, Prompts, and Fine-Tuning
Adam Jermyn
4mo
9
29
The reward engineering problem
paulfchristiano
3y
3
6
Some work on connecting UDT and Reinforcement Learning
IAFF-User-111
7y
0
6
Modeling the capabilities of advanced AI systems as episodic reinforcement learning
jessicata
6y
0
2
Vector-Valued Reinforcement Learning
orthonormal
6y
0
0
Reward/value learning for reinforcement learning
Stuart_Armstrong
5y
0
1
Delegative Reinforcement Learning with a Merely Sane Advisor
Vanessa Kosoy
5y
2
37
Gradations of Agency
Daniel Kokotajlo
7mo
6
134
Why Subagents?
johnswentworth
3y
42
259
Humans are very reliable agents
alyssavance
6mo
35
37
Committing, Assuming, Externalizing, and Internalizing
Scott Garrabrant
2y
25
39
Eight Definitions of Observability
Scott Garrabrant
2y
26
42
What if memes are common in highly capable minds?
Daniel Kokotajlo
2y
15
93
Updates and additions to "Embedded Agency"
Rob Bensinger
2y
1
155
Introduction to Cartesian Frames
Scott Garrabrant
2y
29
93
Subsystem Alignment
abramdemski
4y
12
62
Functors and Coarse Worlds
Scott Garrabrant
2y
4
20
You Only Get One Shot: an Intuition Pump for Embedded Agency
Oliver Sourbut
6mo
4
59
Time in Cartesian Frames
Scott Garrabrant
2y
16
38
Logical Updatelessness as a Robust Delegation Problem
Scott Garrabrant
5y
2
45
Sub-Sums and Sub-Tensors
Scott Garrabrant
2y
4