Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
0 posts
Truthful AI
593 posts
AI
Autonomy and Choice
62
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
37
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
21
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
19h
0
232
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
92
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
159
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
42
High-level hopes for AI alignment
HoldenKarnofsky
5d
3
37
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
30
Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.
Charlie Steiner
7d
3
56
Verification Is Not Easier Than Generation In General
johnswentworth
14d
23
68
Why Would AI "Aim" To Defeat Humanity?
HoldenKarnofsky
21d
9
41
In defense of probably wrong mechanistic models
evhub
14d
10
65
Distinguishing test from training
So8res
21d
10
20
Concept extrapolation for hypothesis generation
Stuart_Armstrong
8d
2