Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
0 posts
Autonomy and Choice
593 posts
AI
45
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
30
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
19h
0
35
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
213
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
99
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
55
High-level hopes for AI alignment
HoldenKarnofsky
5d
3
136
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
37
Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.
Charlie Steiner
7d
3
64
Verification Is Not Easier Than Generation In General
johnswentworth
14d
23
32
Concept extrapolation for hypothesis generation
Stuart_Armstrong
8d
2
25
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
71
Why Would AI "Aim" To Defeat Humanity?
HoldenKarnofsky
21d
9
67
Distinguishing test from training
So8res
21d
10
39
In defense of probably wrong mechanistic models
evhub
14d
10