Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
101 posts
Reinforcement Learning
AI Capabilities
Inverse Reinforcement Learning
Wireheading
Definitions
Reward Functions
Stag Hunt
Road To AI Safety Excellence
Goals
Prompt Engineering
EfficientZero
PaLM
63 posts
Value Learning
The Pointers Problem
13
Note on algorithms with multiple trained components
Steven Byrnes
6h
1
71
When AI solves a game, focus on the game's mechanics, not its theme.
Cleo Nardo
27d
7
218
Reward is not the optimization target
TurnTrout
4mo
97
35
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
1mo
0
39
Will we run out of ML data? Evidence from projecting dataset size trends
Pablo Villalobos
1mo
12
281
Is AI Progress Impossible To Predict?
alyssavance
7mo
38
10
Can GPT-3 Write Contra Dances?
jefftk
16d
0
16
generalized wireheading
carado
1mo
7
77
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
194
EfficientZero: How It Works
1a3orn
1y
42
5
Mastering Stratego (Deepmind)
svemirski
18d
0
139
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
gwern
1y
52
5
AGIs may value intrinsic rewards more than extrinsic ones
catubc
1mo
6
35
The Problem With The Current State of AGI Definitions
Yitz
6mo
22
21
Character alignment
p.b.
3mo
0
48
Different perspectives on concept extrapolation
Stuart_Armstrong
8mo
7
23
Value extrapolation vs Wireheading
Stuart_Armstrong
6mo
1
93
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
johnswentworth
2y
43
29
How an alien theory of mind might be unlearnable
Stuart_Armstrong
11mo
35
20
Value extrapolation, concept extrapolation, model splintering
Stuart_Armstrong
9mo
1
20
Morally underdefined situations can be deadly
Stuart_Armstrong
1y
8
13
An Open Philanthropy grant proposal: Causal representation learning of human preferences
PabloAMC
11mo
6
9
AIs should learn human preferences, not biases
Stuart_Armstrong
8mo
1
68
Clarifying "AI Alignment"
paulfchristiano
4y
82
7
The Pointers Problem - Distilled
NinaR
6mo
0
64
Why we need a *theory* of human values
Stuart_Armstrong
4y
15
42
Since figuring out human values is hard, what about, say, monkey values?
shminux
2y
13
58
The E-Coli Test for AI Alignment
johnswentworth
4y
24