Go Back
Choose this branch
You can't go any further
meritocratic
regular
democratic
hot
top
alive
593 posts
AI
Social Media
Autonomy and Choice
Truthful AI
27 posts
Eliciting Latent Knowledge (ELK)
45
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
30
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
19h
0
35
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
213
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
99
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
55
High-level hopes for AI alignment
HoldenKarnofsky
5d
3
136
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
37
Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.
Charlie Steiner
7d
3
64
Verification Is Not Easier Than Generation In General
johnswentworth
14d
23
32
Concept extrapolation for hypothesis generation
Stuart_Armstrong
8d
2
25
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
71
Why Would AI "Aim" To Defeat Humanity?
HoldenKarnofsky
21d
9
67
Distinguishing test from training
So8res
21d
10
39
In defense of probably wrong mechanistic models
evhub
14d
10
73
Can we efficiently explain model behaviors?
paulfchristiano
4d
0
106
Finding gliders in the game of life
paulfchristiano
19d
7
113
Mechanistic anomaly detection and ELK
paulfchristiano
25d
17
85
ARC paper: Formalizing the presumption of independence
Erik Jenner
1mo
2
67
Where I currently disagree with Ryan Greenblatt’s version of the ELK approach
So8res
2mo
7
34
For ELK truth is mostly a distraction
c.trout
1mo
0
219
ARC's first technical report: Eliciting Latent Knowledge
paulfchristiano
1y
88
128
ELK prize results
paulfchristiano
9mo
50
115
Prizes for ELK proposals
paulfchristiano
11mo
156
70
ELK Thought Dump
abramdemski
9mo
18
77
ELK First Round Contest Winners
Mark Xu
10mo
6
60
ELK Computational Complexity: Three Levels of Difficulty
abramdemski
8mo
9
33
Eliciting Latent Knowledge (ELK) - Distillation/Summary
Marius Hobbhahn
6mo
2
64
Counterexamples to some ELK proposals
paulfchristiano
11mo
10