Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

734 posts AI Eliciting Latent Knowledge (ELK) Infra-Bayesianism Counterfactuals Logic & Mathematics Interviews Audio AXRP Redwood Research Transcripts Formal Proof Domain Theory

74 posts Embedded Agency Reinforcement Learning Subagents Reward Functions EfficientZero Robust Agents Wireheading AI Capabilities Spurious Counterfactuals Category Theory Memetics Tradeoffs

503 (My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen

3mo

83

409 Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger

1y

257

265 The Plan

johnswentworth

1y

77

265 Ngo and Yudkowsky on alignment difficulty

Eliezer Yudkowsky

1y

143

265 An overview of 11 proposals for building safe advanced AI

evhub

2y

36

263 DeepMind: Generally capable agents emerge from open-ended play

Daniel Kokotajlo

1y

53

257 larger language models may disappoint you [or, an eternally unfinished draft]

nostalgebraist

1y

29

253 Embedded Agents

abramdemski

4y

41

251 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

248 Visible Thoughts Project and Bounty Announcement

So8res

1y

104

247 Optimality is the tiger, and agents are its teeth

Veedrac

8mo

31

213 Safetywashing

Adam Scholl

5mo

17

206 Attempted Gears Analysis of AGI Intervention Discussion With Eliezer

Zvi

1y

48

205 ARC's first technical report: Eliciting Latent Knowledge

paulfchristiano

1y

88

334 EfficientZero: How It Works

1a3orn

1y

42

271 Reward is not the optimization target

TurnTrout

4mo

97

237 Humans are very reliable agents

alyssavance

6mo

35

191 Embedded Agency (full-text version)

Scott Garrabrant

4y

15

188 Why Subagents?

johnswentworth

3y

42

135 Introduction to Cartesian Frames

Scott Garrabrant

2y

29

124 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

120 We have achieved Noob Gains in AI

phdead

7mo

21

111 Robust Delegation

abramdemski

4y

10

107 Subsystem Alignment

abramdemski

4y

12

107 Reward Is Not Enough

Steven Byrnes

1y

18

107 Evaluations project @ ARC is hiring a researcher and a webdev/engineer

Beth Barnes

3mo

7

104 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

99 Embedded World-Models

abramdemski

4y

16