Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

734 posts AI Eliciting Latent Knowledge (ELK) Infra-Bayesianism Counterfactuals Logic & Mathematics Interviews Audio AXRP Redwood Research Transcripts Formal Proof Domain Theory

74 posts Embedded Agency Reinforcement Learning Subagents Reward Functions EfficientZero Robust Agents Wireheading AI Capabilities Spurious Counterfactuals Category Theory Memetics Tradeoffs

344 (My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen

3mo

83

325 Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger

1y

257

247 DeepMind: Generally capable agents emerge from open-ended play

Daniel Kokotajlo

1y

53

245 Visible Thoughts Project and Bounty Announcement

So8res

1y

104

237 larger language models may disappoint you [or, an eternally unfinished draft]

nostalgebraist

1y

29

235 The Plan

johnswentworth

1y

77

235 Ngo and Yudkowsky on alignment difficulty

Eliezer Yudkowsky

1y

143

232 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

212 ARC's first technical report: Eliciting Latent Knowledge

paulfchristiano

1y

88

212 Safetywashing

Adam Scholl

5mo

17

206 Hiring engineers and researchers to help align GPT-3

paulfchristiano

2y

14

204 Attempted Gears Analysis of AGI Intervention Discussion With Eliezer

Zvi

1y

48

198 Embedded Agents

abramdemski

4y

41

197 Optimality is the tiger, and agents are its teeth

Veedrac

8mo

31

273 EfficientZero: How It Works

1a3orn

1y

42

252 Reward is not the optimization target

TurnTrout

4mo

97

248 Humans are very reliable agents

alyssavance

6mo

35

161 Why Subagents?

johnswentworth

3y

42

145 Introduction to Cartesian Frames

Scott Garrabrant

2y

29

143 Embedded Agency (full-text version)

Scott Garrabrant

4y

15

134 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

114 We have achieved Noob Gains in AI

phdead

7mo

21

110 Robust Delegation

abramdemski

4y

10

105 Reward Is Not Enough

Steven Byrnes

1y

18

100 Subsystem Alignment

abramdemski

4y

12

94 Evaluations project @ ARC is hiring a researcher and a webdev/engineer

Beth Barnes

3mo

7

88 Embedded Curiosities

Scott Garrabrant

4y

1

87 The alignment problem in different capability regimes

Buck

1y

12