Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
0 posts
Truthful AI
593 posts
AI
Autonomy and Choice
79
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
39
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
251
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
12
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
19h
0
85
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
182
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
49
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
29
High-level hopes for AI alignment
HoldenKarnofsky
5d
3
23
Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.
Charlie Steiner
7d
3
503
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
3mo
83
48
Verification Is Not Easier Than Generation In General
johnswentworth
14d
23
43
In defense of probably wrong mechanistic models
evhub
14d
10
65
Why Would AI "Aim" To Defeat Humanity?
HoldenKarnofsky
21d
9
63
Distinguishing test from training
So8res
21d
10