Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

74 posts Infra-Bayesianism Counterfactuals Logic & Mathematics Formal Proof Domain Theory Functional Decision Theory Counterfactual Mugging Newcomb's Problem Futarchy Ontological Crisis Meta-Honesty Intelligence Explosion

40 posts Interviews Audio Redwood Research AXRP Transcripts Adversarial Examples Adversarial Training AI Robustness Autonomous Weapons

16 Vanessa Kosoy's PreDCA, distilled

Martín Soto

1mo

17

49 Infra-Exercises, Part 1

Diffractor

3mo

9

98 Infra-Bayesian physicalism: a formal theory of naturalized induction

Vanessa Kosoy

1y

20

33 Hessian and Basin volume

Vivek Hebbar

5mo

9

17 Counterfactuals are Confusing because of an Ontological Shift

Chris_Leong

4mo

35

104 Introduction To The Infra-Bayesianism Sequence

Diffractor

2y

64

37 The Promise and Peril of Finite Sets

davidad

1y

4

47 MIRI/OP exchange about decision theory

Rob Bensinger

1y

7

17 Infra-Miscellanea

Diffractor

8mo

0

84 Zoom In: An Introduction to Circuits

evhub

2y

11

76 Recent Progress in the Theory of Neural Networks

interstice

3y

9

33 AXRP Episode 5 - Infra-Bayesianism with Vanessa Kosoy

DanielFilan

1y

12

87 Counterfactual Mugging Poker Game

Scott Garrabrant

4y

2

28 The Many Faces of Infra-Beliefs

Diffractor

1y

6

130 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

134 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

26 Causal scrubbing: results on a paren balance checker

LawrenceC

17d

0

86 Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang

1mo

5

135 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

16 Causal scrubbing: Appendix

LawrenceC

17d

0

43 A conversation about Katja's counterarguments to AI risk

Matthew Barnett

2mo

9

136 High-stakes alignment via adversarial training [Redwood Research report]

dmz

7mo

29

143 Redwood Research’s current project

Buck

1y

29

112 Why I'm excited about Redwood Research's current project

paulfchristiano

1y

6

25 Ethan Perez on the Inverse Scaling Prize, Language Feedback and Red Teaming

Michaël Trazzi

3mo

0

33 Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing

Buck

6mo

0

32 AXRP Episode 15 - Natural Abstractions with John Wentworth

DanielFilan

7mo

1

16 AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

DanielFilan

4mo

0