Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

74 posts Infra-Bayesianism Counterfactuals Logic & Mathematics Formal Proof Domain Theory Functional Decision Theory Counterfactual Mugging Newcomb's Problem Futarchy Ontological Crisis Meta-Honesty Intelligence Explosion

40 posts Interviews Audio Redwood Research AXRP Transcripts Adversarial Examples Adversarial Training AI Robustness Autonomous Weapons

98 Infra-Bayesian physicalism: a formal theory of naturalized induction

Vanessa Kosoy

1y

20

87 Introduction To The Infra-Bayesianism Sequence

Diffractor

2y

64

83 Recent Progress in the Theory of Neural Networks

interstice

3y

9

79 Counterfactual Mugging Poker Game

Scott Garrabrant

4y

2

75 Zoom In: An Introduction to Circuits

evhub

2y

11

57 Infra-Exercises, Part 1

Diffractor

3mo

9

54 Probability as Minimal Map

johnswentworth

3y

10

53 Self-confirming predictions can be arbitrarily bad

Stuart_Armstrong

3y

11

51 Complete Class: Consequentialist Foundations

abramdemski

4y

34

36 Basic Inframeasure Theory

Diffractor

2y

16

34 MIRI/OP exchange about decision theory

Rob Bensinger

1y

7

33 The Promise and Peril of Finite Sets

davidad

1y

4

30 AXRP Episode 5 - Infra-Bayesianism with Vanessa Kosoy

DanielFilan

1y

12

29 A Brief Intro to Domain Theory

Diffractor

3y

4

174 High-stakes alignment via adversarial training [Redwood Research report]

dmz

7mo

29

154 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

150 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

136 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

116 Redwood Research’s current project

Buck

1y

29

99 Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang

1mo

5

95 Why I'm excited about Redwood Research's current project

paulfchristiano

1y

6

48 Conversation with Paul Christiano

abergal

3y

6

42 Redwood's Technique-Focused Epistemic Strategy

adamShimi

1y

1

37 AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant

DanielFilan

1y

2

35 A conversation about Katja's counterarguments to AI risk

Matthew Barnett

2mo

9

34 AXRP Episode 15 - Natural Abstractions with John Wentworth

DanielFilan

7mo

1

34 Rohin Shah on reasons for AI optimism

abergal

3y

58

34 If I were a well-intentioned AI... I: Image classifier

Stuart_Armstrong

2y

4