Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
180 posts
Research Agendas
Embedded Agency
Suffering
Agency
Animal Welfare
Risks of Astronomical Suffering (S-risks)
Robust Agents
Cause Prioritization
Center on Long-Term Risk (CLR)
80,000 Hours
Crucial Considerations
Veg*nism
164 posts
Value Learning
Reinforcement Learning
AI Capabilities
Inverse Reinforcement Learning
Wireheading
Definitions
Reward Functions
The Pointers Problem
Stag Hunt
Road To AI Safety Excellence
Goals
EfficientZero
34
My AGI safety research—2022 review, ’23 plans
Steven Byrnes
6d
6
300
On how various plans miss the hard bits of the alignment challenge
So8res
5mo
81
23
Should you refrain from having children because of the risk posed by artificial intelligence?
Mientras
3mo
28
190
Some conceptual alignment research projects
Richard_Ngo
3mo
14
7
EA, Veganism and Negative Animal Utilitarianism
Yair Halberstadt
3mo
12
9
Cooperators are more powerful than agents
Ivan Vendrov
2mo
7
9
Interpreting systems as solving POMDPs: a step towards a formal understanding of agency [paper link]
the gears to ascenscion
1mo
2
247
Humans are very reliable agents
alyssavance
6mo
35
45
Gradations of Agency
Daniel Kokotajlo
7mo
6
-10
A Longtermist case against Veganism
Connor Tabarrok
2mo
2
21
Distilled Representations Research Agenda
Hoagy
2mo
2
4
Some thoughts on Animals
nitinkhanna
5mo
6
19
Peter Singer's first published piece on AI
Fai
5mo
5
4
Vegetarianism and depression
Maggy
2mo
2
286
Reward is not the optimization target
TurnTrout
4mo
97
7
Note on algorithms with multiple trained components
Steven Byrnes
6h
1
109
Will we run out of ML data? Evidence from projecting dataset size trends
Pablo Villalobos
1mo
12
91
When AI solves a game, focus on the game's mechanics, not its theme.
Cleo Nardo
27d
7
75
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
26
generalized wireheading
carado
1mo
7
35
What's the Most Impressive Thing That GPT-4 Could Plausibly Do?
bayesed
3mo
24
11
AGIs may value intrinsic rewards more than extrinsic ones
catubc
1mo
6
25
Latent Variables and Model Mis-Specification
jsteinhardt
4y
7
10
Stable Pointers to Value: An Agent Embedded in Its Own Utility Function
abramdemski
5y
9
26
Is CIRL a promising agenda?
Chris_Leong
6mo
12
45
Remaking EfficientZero (as best I can)
Hoagy
5mo
9
115
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
johnswentworth
2y
43
8
Reward IS the Optimization Target
Carn
2mo
3