Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
30 posts
Reinforcement Learning
Wireheading
Reward Functions
11 posts
AI Capabilities
EfficientZero
Tradeoffs
271
Reward is not the optimization target
TurnTrout
4mo
97
89
Scaling Laws for Reward Model Overoptimization
leogao
2mo
11
80
Big picture of phasic dopamine
Steven Byrnes
1y
18
77
Jitters No Evidence of Stupidity in RL
1a3orn
1y
18
72
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
61
Towards deconfusing wireheading and reward maximization
leogao
3mo
7
52
My take on Michael Littman on "The HCI of HAI"
Alex Flint
1y
4
44
Reward model hacking as a challenge for reward learning
Erik Jenner
8mo
1
43
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
1mo
0
42
Four usages of "loss" in AI
TurnTrout
2mo
18
41
Draft papers for REALab and Decoupled Approval on tampering
Jonathan Uesato
2y
2
28
Scalar reward is not enough for aligned AGI
Peter Vamplew
11mo
3
26
Conditioning, Prompts, and Fine-Tuning
Adam Jermyn
4mo
9
25
Reinforcement learning with imperceptible rewards
Vanessa Kosoy
3y
1
334
EfficientZero: How It Works
1a3orn
1y
42
124
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
gwern
1y
52
120
We have achieved Noob Gains in AI
phdead
7mo
21
107
Evaluations project @ ARC is hiring a researcher and a webdev/engineer
Beth Barnes
3mo
7
104
Will we run out of ML data? Evidence from projecting dataset size trends
Pablo Villalobos
1mo
12
77
OpenAI Solves (Some) Formal Math Olympiad Problems
Michaƫl Trazzi
10mo
26
76
The alignment problem in different capability regimes
Buck
1y
12
43
Remaking EfficientZero (as best I can)
Hoagy
5mo
9
32
It matters when the first sharp left turn happens
Adam Jermyn
2mo
9
32
Misc. questions about EfficientZero
Daniel Kokotajlo
1y
17
2
Epistemic Strategies of Safety-Capabilities Tradeoffs
adamShimi
1y
0