Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
101 posts
Reinforcement Learning
AI Capabilities
Inverse Reinforcement Learning
Wireheading
Definitions
Reward Functions
Stag Hunt
Road To AI Safety Excellence
Goals
Prompt Engineering
EfficientZero
PaLM
63 posts
Value Learning
The Pointers Problem
10
Note on algorithms with multiple trained components
Steven Byrnes
6h
1
81
When AI solves a game, focus on the game's mechanics, not its theme.
Cleo Nardo
27d
7
74
Will we run out of ML data? Evidence from projecting dataset size trends
Pablo Villalobos
1mo
12
252
Reward is not the optimization target
TurnTrout
4mo
97
40
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
1mo
0
276
Is AI Progress Impossible To Predict?
alyssavance
7mo
38
21
generalized wireheading
carado
1mo
7
273
EfficientZero: How It Works
1a3orn
1y
42
76
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
6
Can GPT-3 Write Contra Dances?
jefftk
16d
0
6
Mastering Stratego (Deepmind)
svemirski
18d
0
8
AGIs may value intrinsic rewards more than extrinsic ones
catubc
1mo
6
134
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
gwern
1y
52
23
What's the Most Impressive Thing That GPT-4 Could Plausibly Do?
bayesed
3mo
24
22
Character alignment
p.b.
3mo
0
42
Different perspectives on concept extrapolation
Stuart_Armstrong
8mo
7
104
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
johnswentworth
2y
43
16
Value extrapolation vs Wireheading
Stuart_Armstrong
6mo
1
26
How an alien theory of mind might be unlearnable
Stuart_Armstrong
11mo
35
19
An Open Philanthropy grant proposal: Causal representation learning of human preferences
PabloAMC
11mo
6
14
Value extrapolation, concept extrapolation, model splintering
Stuart_Armstrong
9mo
1
9
The Pointers Problem - Distilled
NinaR
6mo
0
17
Morally underdefined situations can be deadly
Stuart_Armstrong
1y
8
10
AIs should learn human preferences, not biases
Stuart_Armstrong
8mo
1
69
The E-Coli Test for AI Alignment
johnswentworth
4y
24
68
Preface to the sequence on value learning
Rohin Shah
4y
6
65
Why we need a *theory* of human values
Stuart_Armstrong
4y
15
64
Clarifying "AI Alignment"
paulfchristiano
4y
82