Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
73 posts
Reinforcement Learning
Inverse Reinforcement Learning
Wireheading
Reward Functions
Road To AI Safety Excellence
28 posts
AI Capabilities
Definitions
Stag Hunt
Goals
Prompt Engineering
PaLM
EfficientZero
10
Note on algorithms with multiple trained components
Steven Byrnes
6h
1
252
Reward is not the optimization target
TurnTrout
4mo
97
40
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
1mo
0
21
generalized wireheading
carado
1mo
7
76
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
8
AGIs may value intrinsic rewards more than extrinsic ones
catubc
1mo
6
15
An investigation into when agents may be incentivized to manipulate our beliefs.
Felix Hofstätter
3mo
0
16
A Survey of Foundational Methods in Inverse Reinforcement Learning
adamk
3mo
0
82
Jitters No Evidence of Stupidity in RL
1a3orn
1y
18
25
Is CIRL a promising agenda?
Chris_Leong
6mo
12
25
Reward model hacking as a challenge for reward learning
Erik Jenner
8mo
1
59
My take on Michael Littman on "The HCI of HAI"
Alex Flint
1y
4
16
RLHF
Ansh Radhakrishnan
7mo
5
77
Book Review: Human Compatible
Scott Alexander
2y
6
81
When AI solves a game, focus on the game's mechanics, not its theme.
Cleo Nardo
27d
7
74
Will we run out of ML data? Evidence from projecting dataset size trends
Pablo Villalobos
1mo
12
276
Is AI Progress Impossible To Predict?
alyssavance
7mo
38
273
EfficientZero: How It Works
1a3orn
1y
42
6
Can GPT-3 Write Contra Dances?
jefftk
16d
0
6
Mastering Stratego (Deepmind)
svemirski
18d
0
134
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
gwern
1y
52
23
What's the Most Impressive Thing That GPT-4 Could Plausibly Do?
bayesed
3mo
24
34
Remaking EfficientZero (as best I can)
Hoagy
5mo
9
40
The Problem With The Current State of AGI Definitions
Yitz
6mo
22
58
Competitive programming with AlphaCode
Algon
10mo
37
51
Misc. questions about EfficientZero
Daniel Kokotajlo
1y
17
11
How might we make better use of AI capabilities research for alignment purposes?
ghostwheel
3mo
4
7
Uncompetitive programming with GPT-3
Bezzi
10mo
8