Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
74 posts
Infra-Bayesianism
Counterfactuals
Logic & Mathematics
Formal Proof
Domain Theory
Functional Decision Theory
Counterfactual Mugging
Newcomb's Problem
Futarchy
Ontological Crisis
Meta-Honesty
Intelligence Explosion
40 posts
Interviews
Audio
Redwood Research
AXRP
Transcripts
Adversarial Examples
Adversarial Training
AI Robustness
Autonomous Weapons
16
Vanessa Kosoy's PreDCA, distilled
Martín Soto
1mo
17
49
Infra-Exercises, Part 1
Diffractor
3mo
9
98
Infra-Bayesian physicalism: a formal theory of naturalized induction
Vanessa Kosoy
1y
20
33
Hessian and Basin volume
Vivek Hebbar
5mo
9
17
Counterfactuals are Confusing because of an Ontological Shift
Chris_Leong
4mo
35
104
Introduction To The Infra-Bayesianism Sequence
Diffractor
2y
64
37
The Promise and Peril of Finite Sets
davidad
1y
4
47
MIRI/OP exchange about decision theory
Rob Bensinger
1y
7
17
Infra-Miscellanea
Diffractor
8mo
0
84
Zoom In: An Introduction to Circuits
evhub
2y
11
76
Recent Progress in the Theory of Neural Networks
interstice
3y
9
33
AXRP Episode 5 - Infra-Bayesianism with Vanessa Kosoy
DanielFilan
1y
12
87
Counterfactual Mugging Poker Game
Scott Garrabrant
4y
2
28
The Many Faces of Infra-Beliefs
Diffractor
1y
6
130
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
134
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
26
Causal scrubbing: results on a paren balance checker
LawrenceC
17d
0
86
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
135
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
16
Causal scrubbing: Appendix
LawrenceC
17d
0
43
A conversation about Katja's counterarguments to AI risk
Matthew Barnett
2mo
9
136
High-stakes alignment via adversarial training [Redwood Research report]
dmz
7mo
29
143
Redwood Research’s current project
Buck
1y
29
112
Why I'm excited about Redwood Research's current project
paulfchristiano
1y
6
25
Ethan Perez on the Inverse Scaling Prize, Language Feedback and Red Teaming
Michaël Trazzi
3mo
0
33
Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing
Buck
6mo
0
32
AXRP Episode 15 - Natural Abstractions with John Wentworth
DanielFilan
7mo
1
16
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
4mo
0