Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
0 posts
593 posts
AI
Autonomy and Choice
Truthful AI
Social Media
49
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
79
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
39
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
85
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
182
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
72
A shot at the diamond-alignment problem
TurnTrout
2mo
53
251
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
43
In defense of probably wrong mechanistic models
evhub
14d
10
48
Verification Is Not Easier Than Generation In General
johnswentworth
14d
23
8
Concept extrapolation for hypothesis generation
Stuart_Armstrong
8d
2
73
Update to Mysteries of mode collapse: text-davinci-002 not RLHF
janus
1mo
8
87
How could we know that an AGI system will have good consequences?
So8res
1mo
24
69
Automating Auditing: An ambitious concrete technical research proposal
evhub
1y
9
73
Response to Katja Grace's AI x-risk counterarguments
Erik Jenner
2mo
18