Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

103 posts Interpretability (ML & AI) Machine Learning (ML) DeepMind Truth, Semantics, & Meaning AI Success Models OpenAI Lottery Ticket Hypothesis Anthropic Conservatism (AI) Honesty Principal-Agent Problems Map and Territory

50 posts GPT Bounties & Prizes (active) AI-assisted Alignment Moore's Law Compute Nanotechnology List of Links AI Safety Public Materials Computer Science Tripwire Quantum Mechanics

15 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

132 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

307 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

239 The Plan - 2022 Update

johnswentworth

19d

33

142 Re-Examining LayerNorm

Eric Winsor

19d

8

33 Extracting and Evaluating Causal Direction in LLMs' Activations

Fabien Roger

6d

2

20 Paper: Transformers learn in-context by gradient descent

LawrenceC

4d

11

35 Reframing inner alignment

davidad

9d

13

25 [ASoT] Natural abstractions and AlphaZero

Ulisse Mini

10d

1

53 Multi-Component Learning and S-Curves

Adam Jermyn

20d

24

422 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

140 Clarifying AI X-risk

zac_kenton

1mo

23

410 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

21 My thoughts on OpenAI's Alignment plan

Donald Hobson

10d

0

70 Predicting GPU performance

Marius Hobbhahn

6d

24

96 [Link] Why I’m optimistic about OpenAI’s alignment approach

janleike

15d

13

34 An exploration of GPT-2's embedding weights

Adam Scherlis

7d

2

62 [ASoT] Finetuning, RL, and GPT's world prior

Jozdien

18d

8

75 By Default, GPTs Think In Plain Sight

Fabien Roger

1mo

16

10 Alignment with argument-networks and assessment-predictions

Tor Økland Barstad

7d

3

20 Research request (alignment strategy): Deep dive on "making AI solve alignment for us"

JanBrauner

19d

3

14 [LINK] - ChatGPT discussion

JanBrauner

19d

7

255 New Scaling Laws for Large Language Models

1a3orn

8mo

21

175 Godzilla Strategies

johnswentworth

6mo

65

99 Beliefs and Disagreements about Automating Alignment Research

Ian McKenzie

3mo

4

36 Prizes for ML Safety Benchmark Ideas

joshc

1mo

3

84 $20K In Bounties for AI Safety Public Materials

Dan H

4mo

7

43 Recall and Regurgitation in GPT2

Megan Kinniment

2mo

1