Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
0 posts
Truthful AI
593 posts
AI
Autonomy and Choice
37
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
62
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
37
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
92
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
159
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
77
A shot at the diamond-alignment problem
TurnTrout
2mo
53
232
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
41
In defense of probably wrong mechanistic models
evhub
14d
10
56
Verification Is Not Easier Than Generation In General
johnswentworth
14d
23
20
Concept extrapolation for hypothesis generation
Stuart_Armstrong
8d
2
69
Update to Mysteries of mode collapse: text-davinci-002 not RLHF
janus
1mo
8
86
How could we know that an AGI system will have good consequences?
So8res
1mo
24
77
Automating Auditing: An ambitious concrete technical research proposal
evhub
1y
9
75
Response to Katja Grace's AI x-risk counterarguments
Erik Jenner
2mo
18