Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
28 posts
GPT
List of Links
14 posts
AI-assisted Alignment
Bounties & Prizes (active)
AI Safety Public Materials
18
An exploration of GPT-2's embedding weights
Adam Scherlis
7d
2
12
[LINK] - ChatGPT discussion
JanBrauner
19d
7
125
Developmental Stages of GPTs
orthonormal
2y
74
0
New(ish) AI control ideas
Stuart_Armstrong
5y
0
96
Collection of GPT-3 results
Kaj_Sotala
2y
24
145
interpreting GPT: the logit lens
nostalgebraist
2y
32
164
MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"
Rob Bensinger
1y
13
105
Alignment As A Bottleneck To Usefulness Of GPT-3
johnswentworth
2y
57
42
Will OpenAI's work unintentionally increase existential risks related to AI?
adamShimi
2y
56
50
[AN #102]: Meta learning by GPT-3, and a list of full proposals for AI alignment
Rohin Shah
2y
6
44
AI Alignment Writing Day Roundup #2
Ben Pace
3y
2
63
$1000 bounty for OpenAI to show whether GPT3 was "deliberately" pretending to be stupider than it is
jacobjacob
2y
40
42
To what extent are the scaling properties of Transformer networks exceptional?
abramdemski
2y
1
67
To what extent is GPT-3 capable of reasoning?
TurnTrout
2y
74
90
[Link] Why I’m optimistic about OpenAI’s alignment approach
janleike
15d
13
12
Research request (alignment strategy): Deep dive on "making AI solve alignment for us"
JanBrauner
19d
3
8
Distribution Shifts and The Importance of AI Safety
Leon Lang
2mo
2
5
AI-assisted list of ten concrete alignment things to do right now
lcmgcd
3mo
5
68
NeurIPS ML Safety Workshop 2022
Dan H
4mo
2
22
[$20K in Prizes] AI Safety Arguments Competition
Dan H
7mo
543
127
Godzilla Strategies
johnswentworth
6mo
65
135
How much chess engine progress is about adapting to bigger computers?
paulfchristiano
1y
23
85
Beliefs and Disagreements about Automating Alignment Research
Ian McKenzie
3mo
4
6
Getting from an unaligned AGI to an aligned AGI?
Tor Økland Barstad
6mo
7
52
$20K In Bounties for AI Safety Public Materials
Dan H
4mo
7
4
Alignment with argument-networks and assessment-predictions
Tor Økland Barstad
7d
3
36
Prizes for ML Safety Benchmark Ideas
joshc
1mo
3
2
Making it harder for an AGI to "trick" us, with STVs
Tor Økland Barstad
5mo
5