Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

20 posts Corrigibility Treacherous Turn Programming 2017-2019 AI Alignment Prize Petrov Day

9 posts Instrumental Convergence Satisficer LessWrong Event Transcripts

87 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

85 Announcement: AI alignment prize round 3 winners and next round

cousin_it

4y

7

76 A Gym Gridworld Environment for the Treacherous Turn

Michaël Trazzi

4y

9

58 Announcement: AI alignment prize round 4 winners

cousin_it

3y

41

49 Corrigibility

paulfchristiano

4y

7

39 Solve Corrigibility Week

Logan Riggs

1y

21

30 Do what we mean vs. do what we say

Rohin Shah

4y

14

29 Can corrigibility be learned safely?

Wei_Dai

4y

115

22 [Linkpost] Treacherous turns in the wild

Mark Xu

1y

6

20 Formalizing Policy-Modification Corrigibility

TurnTrout

1y

6

19 [AN #165]: When large models are more likely to lie

Rohin Shah

1y

0

17 Addressing three problems with counterfactual corrigibility: bad bets, defending against backstops, and overconfidence.

RyanCarey

4y

1

15 On corrigibility and its basin

Donald Hobson

6mo

3

15 Corrigibility as Constrained Optimisation

Henrik Åslund

3y

3

227 Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More

Ben Pace

3y

60

183 Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout

3y

38

60 Environmental Structure Can Cause Instrumental Convergence

TurnTrout

1y

44

58 Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability

TurnTrout

1y

8

56 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

33 Clarifying Power-Seeking and Instrumental Convergence

TurnTrout

3y

7

32 Empowerment is (almost) All We Need

jacob_cannell

1mo

43

22 Power as Easily Exploitable Opportunities

TurnTrout

2y

5

14 MDP models are determined by the agent architecture and the environmental dynamics

TurnTrout

1y

34