Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
28 posts
GPT
List of Links
14 posts
AI-assisted Alignment
Bounties & Prizes (active)
AI Safety Public Materials
255
New Scaling Laws for Large Language Models
1a3orn
8mo
21
171
interpreting GPT: the logit lens
nostalgebraist
2y
32
170
The case for aligning narrowly superhuman models
Ajeya Cotra
1y
74
155
Developmental Stages of GPTs
orthonormal
2y
74
132
Can you get AGI from a Transformer?
Steven Byrnes
2y
39
117
Alignment As A Bottleneck To Usefulness Of GPT-3
johnswentworth
2y
57
108
MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"
Rob Bensinger
1y
13
82
Collection of GPT-3 results
Kaj_Sotala
2y
24
75
By Default, GPTs Think In Plain Sight
Fabien Roger
1mo
16
73
To what extent is GPT-3 capable of reasoning?
TurnTrout
2y
74
62
[ASoT] Finetuning, RL, and GPT's world prior
Jozdien
18d
8
59
OpenAI announces GPT-3
gwern
2y
23
58
Will OpenAI's work unintentionally increase existential risks related to AI?
adamShimi
2y
56
57
How "honest" is GPT-3?
abramdemski
2y
18
175
Godzilla Strategies
johnswentworth
6mo
65
126
[$20K in Prizes] AI Safety Arguments Competition
Dan H
7mo
543
99
Beliefs and Disagreements about Automating Alignment Research
Ian McKenzie
3mo
4
96
[Link] Why I’m optimistic about OpenAI’s alignment approach
janleike
15d
13
93
How much chess engine progress is about adapting to bigger computers?
paulfchristiano
1y
23
84
$20K In Bounties for AI Safety Public Materials
Dan H
4mo
7
76
NeurIPS ML Safety Workshop 2022
Dan H
4mo
2
36
Prizes for ML Safety Benchmark Ideas
joshc
1mo
3
26
Distribution Shifts and The Importance of AI Safety
Leon Lang
2mo
2
26
Making it harder for an AGI to "trick" us, with STVs
Tor Økland Barstad
5mo
5
20
Research request (alignment strategy): Deep dive on "making AI solve alignment for us"
JanBrauner
19d
3
12
Getting from an unaligned AGI to an aligned AGI?
Tor Økland Barstad
6mo
7
11
AI-assisted list of ten concrete alignment things to do right now
lcmgcd
3mo
5
10
Alignment with argument-networks and assessment-predictions
Tor Økland Barstad
7d
3