Go Back
Choose this branch
You can't go any further
meritocratic
regular
democratic
hot
top
alive
593 posts
AI
Social Media
Autonomy and Choice
Truthful AI
27 posts
Eliciting Latent Knowledge (ELK)
344
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
3mo
83
325
Discussion with Eliezer Yudkowsky on AGI interventions
Rob Bensinger
1y
257
247
DeepMind: Generally capable agents emerge from open-ended play
Daniel Kokotajlo
1y
53
245
Visible Thoughts Project and Bounty Announcement
So8res
1y
104
237
larger language models may disappoint you [or, an eternally unfinished draft]
nostalgebraist
1y
29
235
The Plan
johnswentworth
1y
77
235
Ngo and Yudkowsky on alignment difficulty
Eliezer Yudkowsky
1y
143
232
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
212
Safetywashing
Adam Scholl
5mo
17
206
Hiring engineers and researchers to help align GPT-3
paulfchristiano
2y
14
204
Attempted Gears Analysis of AGI Intervention Discussion With Eliezer
Zvi
1y
48
198
Embedded Agents
abramdemski
4y
41
197
Optimality is the tiger, and agents are its teeth
Veedrac
8mo
31
194
An overview of 11 proposals for building safe advanced AI
evhub
2y
36
212
ARC's first technical report: Eliciting Latent Knowledge
paulfchristiano
1y
88
141
Prizes for ELK proposals
paulfchristiano
11mo
156
130
ELK prize results
paulfchristiano
9mo
50
121
Mechanistic anomaly detection and ELK
paulfchristiano
25d
17
91
Finding gliders in the game of life
paulfchristiano
19d
7
88
ARC paper: Formalizing the presumption of independence
Erik Jenner
1mo
2
63
Where I currently disagree with Ryan Greenblatt’s version of the ELK approach
So8res
2mo
7
63
Can we efficiently explain model behaviors?
paulfchristiano
4d
0
63
ELK First Round Contest Winners
Mark Xu
10mo
6
58
ELK Thought Dump
abramdemski
9mo
18
50
Counterexamples to some ELK proposals
paulfchristiano
11mo
10
49
Eliciting Latent Knowledge (ELK) - Distillation/Summary
Marius Hobbhahn
6mo
2
46
ELK Computational Complexity: Three Levels of Difficulty
abramdemski
8mo
9
38
Eliciting Latent Knowledge Via Hypothetical Sensors
John_Maxwell
11mo
2