Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
73 posts
Reinforcement Learning
Inverse Reinforcement Learning
Wireheading
Reward Functions
Road To AI Safety Excellence
28 posts
AI Capabilities
Definitions
Stag Hunt
Goals
Prompt Engineering
PaLM
EfficientZero
218
Reward is not the optimization target
TurnTrout
4mo
97
13
Note on algorithms with multiple trained components
Steven Byrnes
6h
1
77
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
16
generalized wireheading
carado
1mo
7
5
AGIs may value intrinsic rewards more than extrinsic ones
catubc
1mo
6
24
Is CIRL a promising agenda?
Chris_Leong
6mo
12
-10
Reward IS the Optimization Target
Carn
2mo
3
-4
Reinforcement Learner Wireheading
Nate Showell
5mo
2
58
The Stamp Collector
So8res
7y
14
2
What messy problems do you see Deep Reinforcement Learning applicable to?
Riccardo Volpato
2y
0
39
You cannot be mistaken about (not) wanting to wirehead
Kaj_Sotala
12y
79
11
Reward function learning: the value function
Stuart_Armstrong
4y
0
0
Inverse reinforcement learning on self, pre-ontology-change
Stuart_Armstrong
7y
0
6
Some work on connecting UDT and Reinforcement Learning
IAFF-User-111
7y
0
39
Will we run out of ML data? Evidence from projecting dataset size trends
Pablo Villalobos
1mo
12
71
When AI solves a game, focus on the game's mechanics, not its theme.
Cleo Nardo
27d
7
11
What's the Most Impressive Thing That GPT-4 Could Plausibly Do?
bayesed
3mo
24
23
Remaking EfficientZero (as best I can)
Hoagy
5mo
9
2
How might we make better use of AI capabilities research for alignment purposes?
ghostwheel
3mo
4
281
Is AI Progress Impossible To Predict?
alyssavance
7mo
38
50
Do Humans Want Things?
lukeprog
11y
53
69
Misc. questions about EfficientZero
Daniel Kokotajlo
1y
17
35
The Problem With The Current State of AGI Definitions
Yitz
6mo
22
1
Define Rationality
Marshall
13y
14
19
Seeking better name for "Effective Egoism"
DataPacRat
6y
30
43
Note on Terminology: "Rationality", not "Rationalism"
Vladimir_Nesov
11y
51
29
Disambiguating "alignment" and related notions
David Scott Krueger (formerly: capybaralet)
4y
21
2
Uncompetitive programming with GPT-3
Bezzi
10mo
8