Tree of Tags

Go Back

Choose this branch

You can't go any further

meritocratic regular democratic

hot top alive

593 posts AI Social Media Autonomy and Choice Truthful AI

27 posts Eliciting Latent Knowledge (ELK)

344 (My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen

3mo

83

325 Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger

1y

257

247 DeepMind: Generally capable agents emerge from open-ended play

Daniel Kokotajlo

1y

53

245 Visible Thoughts Project and Bounty Announcement

So8res

1y

104

237 larger language models may disappoint you [or, an eternally unfinished draft]

nostalgebraist

1y

29

235 The Plan

johnswentworth

1y

77

235 Ngo and Yudkowsky on alignment difficulty

Eliezer Yudkowsky

1y

143

232 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

212 Safetywashing

Adam Scholl

5mo

17

206 Hiring engineers and researchers to help align GPT-3

paulfchristiano

2y

14

204 Attempted Gears Analysis of AGI Intervention Discussion With Eliezer

Zvi

1y

48

198 Embedded Agents

abramdemski

4y

41

197 Optimality is the tiger, and agents are its teeth

Veedrac

8mo

31

194 An overview of 11 proposals for building safe advanced AI

evhub

2y

36

212 ARC's first technical report: Eliciting Latent Knowledge

paulfchristiano

1y

88

141 Prizes for ELK proposals

paulfchristiano

11mo

156

130 ELK prize results

paulfchristiano

9mo

50

121 Mechanistic anomaly detection and ELK

paulfchristiano

25d

17

91 Finding gliders in the game of life

paulfchristiano

19d

7

88 ARC paper: Formalizing the presumption of independence

Erik Jenner

1mo

2

63 Where I currently disagree with Ryan Greenblatt’s version of the ELK approach

So8res

2mo

7

63 Can we efficiently explain model behaviors?

paulfchristiano

4d

0

63 ELK First Round Contest Winners

Mark Xu

10mo

6

58 ELK Thought Dump

abramdemski

9mo

18

50 Counterexamples to some ELK proposals

paulfchristiano

11mo

10

49 Eliciting Latent Knowledge (ELK) - Distillation/Summary

Marius Hobbhahn

6mo

2

46 ELK Computational Complexity: Three Levels of Difficulty

abramdemski

8mo

9

38 Eliciting Latent Knowledge Via Hypothetical Sensors

John_Maxwell

11mo

2