Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

74 posts Infra-Bayesianism Counterfactuals Logic & Mathematics Formal Proof Domain Theory Functional Decision Theory Counterfactual Mugging Newcomb's Problem Futarchy Ontological Crisis Meta-Honesty Intelligence Explosion

40 posts Interviews Audio Redwood Research AXRP Transcripts Adversarial Examples Adversarial Training AI Robustness Autonomous Weapons

29 Vanessa Kosoy's PreDCA, distilled

Martín Soto

1mo

17

57 Infra-Exercises, Part 1

Diffractor

3mo

9

98 Infra-Bayesian physicalism: a formal theory of naturalized induction

Vanessa Kosoy

1y

20

27 Hessian and Basin volume

Vivek Hebbar

5mo

9

22 Counterfactuals are Confusing because of an Ontological Shift

Chris_Leong

4mo

35

87 Introduction To The Infra-Bayesianism Sequence

Diffractor

2y

64

33 The Promise and Peril of Finite Sets

davidad

1y

4

34 MIRI/OP exchange about decision theory

Rob Bensinger

1y

7

15 Infra-Miscellanea

Diffractor

8mo

0

75 Zoom In: An Introduction to Circuits

evhub

2y

11

83 Recent Progress in the Theory of Neural Networks

interstice

3y

9

30 AXRP Episode 5 - Infra-Bayesianism with Vanessa Kosoy

DanielFilan

1y

12

79 Counterfactual Mugging Poker Game

Scott Garrabrant

4y

2

36 Basic Inframeasure Theory

Diffractor

2y

16

154 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

150 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

99 Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang

1mo

5

25 Causal scrubbing: results on a paren balance checker

LawrenceC

17d

0

136 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

17 Causal scrubbing: Appendix

LawrenceC

17d

0

174 High-stakes alignment via adversarial training [Redwood Research report]

dmz

7mo

29

35 A conversation about Katja's counterarguments to AI risk

Matthew Barnett

2mo

9

30 Ethan Perez on the Inverse Scaling Prize, Language Feedback and Red Teaming

Michaël Trazzi

3mo

0

116 Redwood Research’s current project

Buck

1y

29

95 Why I'm excited about Redwood Research's current project

paulfchristiano

1y

6

30 Latent Adversarial Training

Adam Jermyn

5mo

9

34 AXRP Episode 15 - Natural Abstractions with John Wentworth

DanielFilan

7mo

1

27 Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing

Buck

6mo

0