Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

74 posts Infra-Bayesianism Counterfactuals Logic & Mathematics Formal Proof Domain Theory Functional Decision Theory Counterfactual Mugging Newcomb's Problem Futarchy Ontological Crisis Meta-Honesty Intelligence Explosion

40 posts Interviews Audio Redwood Research AXRP Transcripts Adversarial Examples Adversarial Training AI Robustness Autonomous Weapons

98 Infra-Bayesian physicalism: a formal theory of naturalized induction

Vanessa Kosoy

1y

20

12 Counterfactuals are Confusing because of an Ontological Shift

Chris_Leong

4mo

35

44 Probability as Minimal Map

johnswentworth

3y

10

39 Hessian and Basin volume

Vivek Hebbar

5mo

9

121 Introduction To The Infra-Bayesianism Sequence

Diffractor

2y

64

41 Infra-Exercises, Part 1

Diffractor

3mo

9

21 A Brief Intro to Domain Theory

Diffractor

3y

4

6 Third-person counterfactuals

Benya_Fallenstein

7y

0

9 The odd counterfactuals of playing chicken

Benya_Fallenstein

7y

0

1 Un-manipulable counterfactuals

Stuart_Armstrong

7y

0

0 Orthogonality: action counterfactuals

Stuart_Armstrong

7y

0

19 Optimal and Causal Counterfactual Worlds

Scott Garrabrant

7y

0

8 Provability Counterfactuals vs Three Axioms of Galles and Pearl

IAFF-User-52

7y

0

4 Logical Counterfactuals Consistent Under Self-Modification

abramdemski

7y

0

106 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

118 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

18 Latent Adversarial Training

Adam Jermyn

5mo

9

134 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

73 Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang

1mo

5

98 High-stakes alignment via adversarial training [Redwood Research report]

dmz

7mo

29

54 Redwood's Technique-Focused Epistemic Strategy

adamShimi

1y

1

18 AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

DanielFilan

4mo

0

16 AXRP Episode 1 - Adversarial Policies with Adam Gleave

DanielFilan

1y

5

27 [AN #146]: Plausible stories of how we might fail to avert an existential catastrophe

Rohin Shah

1y

1

18 AXRP Episode 2 - Learning Human Biases with Rohin Shah

DanielFilan

1y

0

46 AXRP Episode 7 - Side Effects with Victoria Krakovna

DanielFilan

1y

6

9 AXRP Episode 18 - Concept Extrapolation with Stuart Armstrong

DanielFilan

3mo

1

34 AXRP Episode 7.5 - Forecasting Transformative AI from Biological Anchors with Ajeya Cotra

DanielFilan

1y

1