Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

80 posts Oracle AI Myopia AI Boxing (Containment) Deceptive Alignment Deception Acausal Trade Self Fulfilling/Refuting Prophecies Bounties (closed) Parables & Fables Superrationality Values handshakes Computer Security & Cryptography

88 posts Conjecture (org) Language Models Refine Agency Deconfusion Scaling Laws Project Announcement Encultured AI (org) Tool AI Definitions PaLM Prompt Engineering

324 The Parable of Predict-O-Matic

abramdemski

3y

42

139 Decision theory does not imply that we get to have nice things

So8res

2mo

53

117 Monitoring for deceptive alignment

evhub

3mo

7

89 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

80 How likely is deceptive alignment?

evhub

3mo

21

75 The Credit Assignment Problem

abramdemski

3y

40

67 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

57 Open Problems with Myopia

Mark Xu

1y

16

54 Contest: $1,000 for good questions to ask to an Oracle AI

Stuart_Armstrong

3y

156

53 Partial Agency

abramdemski

3y

18

53 AI safety via market making

evhub

2y

45

50 LCDT, A Myopic Decision Theory

adamShimi

1y

51

49 Results of $1,000 Oracle contest!

Stuart_Armstrong

2y

2

48 Prize for probable problems

paulfchristiano

4y

63

759 Simulators

janus

3mo

103

494 chinchilla's wild implications

nostalgebraist

4mo

114

254 We Are Conjecture, A New Alignment Research Startup

Connor Leahy

8mo

24

248 Mysteries of mode collapse

janus

1mo

35

223 Conjecture: a retrospective after 8 months of work

Connor Leahy

27d

9

222 The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren

22d

27

191 Announcing the Inverse Scaling Prize ($250k Prize Pool)

Ethan Perez

5mo

14

190 Interpreting Neural Networks through the Polytope Lens

Sid Black

2mo

26

170 Refine: An Incubator for Conceptual Alignment Research Bets

adamShimi

8mo

13

165 Language models seem to be much better than humans at next-token prediction

Buck

4mo

56

140 Who models the models that model models? An exploration of GPT-3's in-context model fitting ability

Lovre

6mo

14

130 Beyond Astronomical Waste

Wei_Dai

4y

41

127 Transformer Circuits

evhub

12mo

4

127 Circumventing interpretability: How to defeat mind-readers

Lee Sharkey

5mo

8