Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

74 posts Infra-Bayesianism Counterfactuals Logic & Mathematics Formal Proof Domain Theory Functional Decision Theory Counterfactual Mugging Newcomb's Problem Futarchy Ontological Crisis Meta-Honesty Intelligence Explosion

40 posts Interviews Audio Redwood Research AXRP Transcripts Adversarial Examples Adversarial Training AI Robustness Autonomous Weapons

121 Introduction To The Infra-Bayesianism Sequence

Diffractor

2y

64

98 Infra-Bayesian physicalism: a formal theory of naturalized induction

Vanessa Kosoy

1y

20

95 Counterfactual Mugging Poker Game

Scott Garrabrant

4y

2

93 Zoom In: An Introduction to Circuits

evhub

2y

11

69 Recent Progress in the Theory of Neural Networks

interstice

3y

9

60 MIRI/OP exchange about decision theory

Rob Bensinger

1y

7

49 Complete Class: Consequentialist Foundations

abramdemski

4y

34

44 Probability as Minimal Map

johnswentworth

3y

10

41 Infra-Exercises, Part 1

Diffractor

3mo

9

41 The Promise and Peril of Finite Sets

davidad

1y

4

39 Self-confirming predictions can be arbitrarily bad

Stuart_Armstrong

3y

11

39 Hessian and Basin volume

Vivek Hebbar

5mo

9

38 Counterfactuals, thick and thin

Nisan

4y

11

36 AXRP Episode 5 - Infra-Bayesianism with Vanessa Kosoy

DanielFilan

1y

12

170 Redwood Research’s current project

Buck

1y

29

134 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

129 Why I'm excited about Redwood Research's current project

paulfchristiano

1y

6

118 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

106 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

98 High-stakes alignment via adversarial training [Redwood Research report]

dmz

7mo

29

75 AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant

DanielFilan

1y

2

73 Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang

1mo

5

54 Redwood's Technique-Focused Epistemic Strategy

adamShimi

1y

1

51 A conversation about Katja's counterarguments to AI risk

Matthew Barnett

2mo

9

50 AXRP Episode 4 - Risks from Learned Optimization with Evan Hubinger

DanielFilan

1y

10

47 AXRP Episode 10 - AI’s Future and Impacts with Katja Grace

DanielFilan

1y

2

46 AXRP Episode 7 - Side Effects with Victoria Krakovna

DanielFilan

1y

6

46 Rohin Shah on reasons for AI optimism

abergal

3y

58