Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

80 posts Oracle AI Myopia AI Boxing (Containment) Deceptive Alignment Deception Acausal Trade Self Fulfilling/Refuting Prophecies Bounties (closed) Parables & Fables Superrationality Values handshakes Computer Security & Cryptography

88 posts Conjecture (org) Language Models Refine Agency Deconfusion Scaling Laws Project Announcement Encultured AI (org) Tool AI Definitions PaLM Prompt Engineering

291 The Parable of Predict-O-Matic

abramdemski

3y

42

142 Decision theory does not imply that we get to have nice things

So8res

2mo

53

118 Monitoring for deceptive alignment

evhub

3mo

7

91 The Credit Assignment Problem

abramdemski

3y

40

87 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

72 How likely is deceptive alignment?

evhub

3mo

21

60 Prize for probable problems

paulfchristiano

4y

63

58 Partial Agency

abramdemski

3y

18

58 Results of $1,000 Oracle contest!

Stuart_Armstrong

2y

2

57 Open Problems with Myopia

Mark Xu

1y

16

57 Contest: $1,000 for good questions to ask to an Oracle AI

Stuart_Armstrong

3y

156

56 Arguments against myopic training

Richard_Ngo

2y

39

55 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

55 Why GPT wants to mesa-optimize & how we might change this

John_Maxwell

2y

32

472 Simulators

janus

3mo

103

364 chinchilla's wild implications

nostalgebraist

4mo

114

213 Mysteries of mode collapse

janus

1mo

35

186 We Are Conjecture, A New Alignment Research Startup

Connor Leahy

8mo

24

183 Conjecture: a retrospective after 8 months of work

Connor Leahy

27d

9

166 Announcing the Inverse Scaling Prize ($250k Prize Pool)

Ethan Perez

5mo

14

164 Language models seem to be much better than humans at next-token prediction

Buck

4mo

56

159 The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren

22d

27

142 Transformer Circuits

evhub

12mo

4

123 Refine: An Incubator for Conceptual Alignment Research Bets

adamShimi

8mo

13

123 Interpreting Neural Networks through the Polytope Lens

Sid Black

2mo

26

119 Beyond Astronomical Waste

Wei_Dai

4y

41

118 The case for becoming a black-box investigator of language models

Buck

7mo

19

112 Who models the models that model models? An exploration of GPT-3's in-context model fitting ability

Lovre

6mo

14