Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
73 posts
Reinforcement Learning
Inverse Reinforcement Learning
Wireheading
Reward Functions
Road To AI Safety Excellence
28 posts
AI Capabilities
Definitions
Stag Hunt
Goals
Prompt Engineering
PaLM
EfficientZero
286
Reward is not the optimization target
TurnTrout
4mo
97
176
Are wireheads happy?
Scott Alexander
12y
107
80
Jitters No Evidence of Stupidity in RL
1a3orn
1y
18
75
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
71
RAISE is launching their MVP
3y
1
70
Thoughts on "Human-Compatible"
TurnTrout
3y
35
62
Book Review: Human Compatible
Scott Alexander
2y
6
55
My take on Michael Littman on "The HCI of HAI"
Alex Flint
1y
4
49
You cannot be mistaken about (not) wanting to wirehead
Kaj_Sotala
12y
79
47
Reward model hacking as a challenge for reward learning
Erik Jenner
8mo
1
45
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
1mo
0
43
Draft papers for REALab and Decoupled Approval on tampering
Jonathan Uesato
2y
2
38
A definition of wireheading
Anja
10y
80
37
AI Safety Prerequisites Course: Basic abstract representations of computation
RAISE
3y
2
352
EfficientZero: How It Works
1a3orn
1y
42
271
Is AI Progress Impossible To Predict?
alyssavance
7mo
38
129
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
gwern
1y
52
109
Will we run out of ML data? Evidence from projecting dataset size trends
Pablo Villalobos
1mo
12
91
When AI solves a game, focus on the game's mechanics, not its theme.
Cleo Nardo
27d
7
65
Competitive programming with AlphaCode
Algon
10mo
37
45
The Problem With The Current State of AGI Definitions
Yitz
6mo
22
45
Remaking EfficientZero (as best I can)
Hoagy
5mo
9
35
What's the Most Impressive Thing That GPT-4 Could Plausibly Do?
bayesed
3mo
24
33
Misc. questions about EfficientZero
Daniel Kokotajlo
1y
17
31
Note on Terminology: "Rationality", not "Rationalism"
Vladimir_Nesov
11y
51
30
Do Humans Want Things?
lukeprog
11y
53
20
How might we make better use of AI capabilities research for alignment purposes?
ghostwheel
3mo
4
18
Compact vs. Wide Models
Vaniver
4y
5