Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
0 posts
Truthful AI
593 posts
AI
Autonomy and Choice
25
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
45
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
35
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
99
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
136
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
82
A shot at the diamond-alignment problem
TurnTrout
2mo
53
213
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
39
In defense of probably wrong mechanistic models
evhub
14d
10
64
Verification Is Not Easier Than Generation In General
johnswentworth
14d
23
32
Concept extrapolation for hypothesis generation
Stuart_Armstrong
8d
2
65
Update to Mysteries of mode collapse: text-davinci-002 not RLHF
janus
1mo
8
85
How could we know that an AGI system will have good consequences?
So8res
1mo
24
85
Automating Auditing: An ambitious concrete technical research proposal
evhub
1y
9
77
Response to Katja Grace's AI x-risk counterarguments
Erik Jenner
2mo
18