Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
103 posts
Interpretability (ML & AI)
Machine Learning (ML)
DeepMind
Truth, Semantics, & Meaning
AI Success Models
OpenAI
Lottery Ticket Hypothesis
Anthropic
Conservatism (AI)
Honesty
Principal-Agent Problems
Map and Territory
50 posts
GPT
Bounties & Prizes (active)
AI-assisted Alignment
Moore's Law
Compute
Nanotechnology
List of Links
AI Safety Public Materials
Computer Science
Tripwire
Quantum Mechanics
11
An Open Agency Architecture for Safe Transformative AI
davidad
11h
11
114
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
223
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
183
The Plan - 2022 Update
johnswentworth
19d
33
32
Paper: Transformers learn in-context by gradient descent
LawrenceC
4d
11
59
Reframing inner alignment
davidad
9d
13
37
[ASoT] Natural abstractions and AlphaZero
Ulisse Mini
10d
1
56
Re-Examining LayerNorm
Eric Winsor
19d
8
61
Multi-Component Learning and S-Curves
Adam Jermyn
20d
24
11
Extracting and Evaluating Causal Direction in LLMs' Activations
Fabien Roger
6d
2
19
My thoughts on OpenAI's Alignment plan
Donald Hobson
10d
0
69
Engineering Monosemanticity in Toy Models
Adam Jermyn
1mo
6
318
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
104
Caution when interpreting Deepmind's In-context RL paper
Sam Marks
1mo
6
48
Predicting GPU performance
Marius Hobbhahn
6d
24
90
[Link] Why I’m optimistic about OpenAI’s alignment approach
janleike
15d
13
18
An exploration of GPT-2's embedding weights
Adam Scherlis
7d
2
45
By Default, GPTs Think In Plain Sight
Fabien Roger
1mo
16
12
[LINK] - ChatGPT discussion
JanBrauner
19d
7
12
Research request (alignment strategy): Deep dive on "making AI solve alignment for us"
JanBrauner
19d
3
36
Prizes for ML Safety Benchmark Ideas
joshc
1mo
3
4
Alignment with argument-networks and assessment-predictions
Tor Økland Barstad
7d
3
85
Beliefs and Disagreements about Automating Alignment Research
Ian McKenzie
3mo
4
191
New Scaling Laws for Large Language Models
1a3orn
8mo
21
127
Godzilla Strategies
johnswentworth
6mo
65
68
NeurIPS ML Safety Workshop 2022
Dan H
4mo
2
52
$20K In Bounties for AI Safety Public Materials
Dan H
4mo
7
23
Recall and Regurgitation in GPT2
Megan Kinniment
2mo
1