Signal Room / Leaders Watch

AXRPCivilisational risk and strategyFeatured pick

Tom Davidson on AI-enabled Coups

Why this matters

This episode strengthens first-principles understanding of alignment risk and the strategic conditions that shape safe outcomes.

Summary

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Perspective map

MixedTechnicalMedium confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 133 full-transcript segments: median 0 · mean -5 · spread -31–0 (p10–p90 -11–0) · 9% risk-forward, 91% mixed, 0% opportunity-forward slices.

Slice bands

133 slices · p10–p90 -11–0

Mixed leaning, primarily in the Technical lens. Evidence mode: interview. Confidence: medium.

- Emphasizes alignment
- Emphasizes safety
- Full transcript scored in 133 sequential slices (median slice 0).

Editor note

Anchor episode for the AI Safety Map: high signal, durable framing, and immediate relevance to leadership decisions.

ai-safetyaxrpcore-safetytechnical

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video ltZfc9E6sUQ · stored Apr 2, 2026 · 3,532 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/tom-davidson-on-ai-enabled-coups.json when you have a listen-based summary.

Show full transcript

[Music] Hello everybody. In this episode, I'll be speaking with Tom Davidson. Tom is a senior research fellow at the Forth Institute for AI Strategy. His work is focused on AI takeoff speeds and more recently the threat of humans using AI to stage a coup. To read a transcript of this episode, you can go to axp.net. You can become a patron at patreon.com/exr. You can also give feedback about the episode at axrp.fyi. All right, Tom, welcome to the podcast. >> Pleasure to be here, Danny. >> Yeah, so today we're going to talk about the uh should I say should I call it a paper? Uh so AI enabled coups. How a small group could use AI to seize power by yourself, Lucas Finen, and Rose Hedgehar? Paper is the right term for that. >> Yeah, I think we mostly called it a report, but you know, paper report. Yeah, >> report seems pretty reasonable. Yeah. So, okay. AI enabled coups. So, I guess it's about using AI to do coups in order to just help us help the audience like figure out what's going on. Ignore the AI stuff. I'm lucky enough that I've lived in countries that just have not had a coup in a while and I don't really think about them. >> How do you do a coup? >> Great question. So, um the way that the word coup is normally used, >> there's kind of two two kind of main types of coup. um at least on one way of covering it up. You know, different scholars cover it up in different ways, but you know, one natural way to cover it up is that there are military coups. >> Yep. >> That is a coup performed by um the military as directed by some people within the military. Like most often it's very senior military officials because they already have that authority within the military that would allow them to do that. And so that that's just a case of senior senior general kind of instructs battalions to seize control of um you know various important buildings um threatens um or intimidates people that might try to oppose. Declares that they are now victors over the radio waves. Everyone notices that no one's really opposing them and that you know all the military is you know acting in accordance with that declaration. And so it seems to be they're increasingly um increasingly credible. And then this kind of this this very abrupt transition of power from the old you know the old leaders that that were kind of functioning within that old legal system. There's this sudden abrupt and legal um shift of power. >> So that's a military coup. >> Um then the other type of coup um is that that I would I would kind of point to is referred to as an executive coup. >> And so that is when you have someone who is already typically someone who's already the head of state. So Audi has you know legitimacy as a very powerful political figure but you know to begin with >> there are many checks and balances on on on their power. So especially in democracies you know they'll be heavily constrained and then you know an executive coup um normally is less discreet um but there might be a point at which you think they've really just kind of fully removed that last check and balance. But you know the general process there is the the head of state is is you know undermining the the independence of the judiciary by you know stuffing it with their loyalists. they're kind of again stuffing the the legislative bodies again with with people who are going to support them um and you know you know actually making legal changes to to kind of centralize more and more power in the executive branch in in themselves and then you know at some point in that process you might be like okay at this point you know the old legal order has really been overturned and there's been an executive coup um and so you know Venezuela I think is the best case study of this happening end to end because it did start off um in the mid 20th century is a pretty healthy democracy had been going for many decades but you know by 2020 was was widely considered to just now have been an authoritarian company uh country and you know at some point normally I think people at near the end of that process you know might say okay there was an executive coupe at that point but you know it's much more fuzzy concept in that case >> is that also I' I've sometimes heard the term like self coup or auto goal pay is that the same thing >> exactly yeah >> okay so so I can kind of understand like so an executive Q I guess like I kind of have a picture in my head of like okay you like consolidate power within yourself. You make the laws accountable to you. You make the like like somehow you turn the state more and more into yourself and like all of the like organs of the state that are loyal to the state are now loyal to you because like you just have a bunch of control over the state. >> Um for a military coup like like what do you actually do? Do you have to just like go to the Supreme Court and like point guns at the heads of the justices and like go to all the police stations and like point guns to the heads of the leaders of theirs or or like >> yeah just yeah help me imagine this concretely. >> Yeah great question. So with a military coup there is a higher risk that they kind of you declare a coup but then the rest of society the other organs of of of government and the other um kind of you know important economic players don't want to play ball >> and so you know historically that has been a real risk and there have been cases there's casing um in Ghana where there had been a military coup and then there was such widespread dissatisfaction throughout about various um various parts of society with the way that things were being governed that they ended up handing back to a more democratic um governance regime because you know there's kind of the military personnel ultimately continuous with the rest of society and so if they're just kind of brazingly unpopular and the the country is just doing badly and that's quite quite kind of clear to everyone then it doesn't really work for them to just kind of be like we're forcing you to do everything we say when people increasing you don't want to play ball Um, and so yeah, historically that's that that that that's been an is a big issue. Um, I know we're not meant to get onto the AI part, but I do want to just flag that I think very powerful AI will somewhat change that dynamic because um, you know, in in the case of Ghana, you know, for the country to to to function well, it needed all those other, you know, parts of society to play ball, >> right? Um but if we have sufficiently powerful AI systems, it may be possible to kind of um replace those those those those other kind of players with with with automated replacements. So okay, there's one end where like some general of some army just stands up and says I declare a coup and then sits down. Like that's presumably like that's not a successful coup, right? >> Like like somehow you need to like make people >> you need to like actually take control, right? But like >> countries are really big, right? There's like a lot of stuff going on. There's like more than one building where people are like doing where people are like making laws where where the laws are getting enforced. So like like do you have to just like like are you like storming five buildings? Are you like >> telling people are you like convincing the rest of the military that they should be loyal to you? And are you convincing the rest of the world that if they they like defy you, you'll like go in and like use all the military to kill them or like like is that what I should imagine? >> There's a great book that I I think might get to the heart of what you're you're asking here called um seizing power by Singh. Um and its its thesis is that at the core of the logic of a military coup is trying to create a common expectation among all military forces that the coup will succeed. >> Right? >> So the basic pieces is no one wants to get into bloodb birth where um you know people are killing their their civilians >> and even other military people from the same country. >> Um and so ultimately everyone wants to be on the side of the winner >> if there is a struggle for power. Um, and that's, you know, according to this thesis, which I find fairly compelling, that is kind of the main determinant of what military personnel are going to do when there's a constitutional crisis, when someone's attempting to do a coup. >> So, this book says that what what your task is as someone who's trying to do a military coup is you need to convince initially the military personnel um that this is a fair company, that you already have the support of all the other military personnel. And so there's this kind of interesting um kind of game theory dynamic where there's no stable equilibria. Like you know before the military coup begins the stable equilibrium is people may hate the regime but they think that the regime has the support of the military. So if they went out and tried to you know start undermining law and order they would expect that other people would come and arrest them. And they'd be right because other people even if they also dislike the regime would indeed come and arrest them because that's their job. And if they don't do their job then they expect that their you know their their kind of associates will will And so it's a selfreinforcing equilibrium. And so what you've got to do when you're doing a military coup is shift that equilibrium over to know actually now there's this new equilibrium um where there's this now new um new new new new new kind of group of people in power. And so a lot of the things that you see happen in military coups um are can be understood as trying to achieve that shift in equilibrium, that shift in consensus about who is now in control. And so like a classic example that that Singh gives in in this book is capturing radio stations and then um kind of sending forth proclamations of your victory which are you credible. You don't you don't massively exaggerate. You don't say every single person in this country has supported me always because that would ring ring false. But you say you know we have the backing of the senior military generals. We have executed a coup. You know the old the old the old um government government has been defeated. And then you do things to make those claims seem more credible like seizing control of key um key key kind of um institutional buildings for example. >> Okay. >> Um and in the absence of vocal resistance, the absence of vocal opposition and of um kind of military you know warfare then then just serves to reinforce this new impression because you've said it. It kind of looks like it's true if you look at where the the military things that have just happened and then no one is saying anything else. Um and if you can get that um consensus of opinion within the military um and then you can convince the rest of the society that yes all of the military all of the hard power supports this new this new regime then the rest of society also why would you oppose a new regime which is backed by all the hard power you know you're just going to get yourself in trouble. So at that point the rest of society's incentive is then to you know work within the new regime. So, you know, as you say, things were set up with the old regime. So, there'll be questions about how you, you know, pragmatically reorganize um you know, exactly what everyone's doing and how everything fits together. But, you know, once you've got essentially everyone in society recognizing, yes, this is the new order and and and we can see that that's what everyone's going for, then you that that's the hard work done. And then it's more like you know filling in the details of exactly how the new um you know set of constit you know bodies um will will relate to each other and what you know the the kind of the chain of command will be from from the new leader to the various parts of society. So, so the picture I have is roughly like in order to do a military coup, I've got to like persuade the military that we're doing a coup like like somehow I've got to persuade the military that like and the rest of the society presumably that if anyone like doesn't, you know, defies my new order, they're going to like like we're going to come in and we're going to beat them. And presumably in order to do that, I've like actually got to go in and beat some people who like might be considering defying my new order just to like demonstrate that I can. Um, and maybe that's why I actually stormed the buildings. I don't What do you mean when you say beat someone? I mean, I think yes, it's you need to so some credible sign that you have the support of military forces. I don't know if you you literally need to then go and like shoot down protesters. >> I mean, that that that's one effective way to show that you're willing to beat people down. Um but but >> you don't you don't there's often coups without any bloodshed. >> Yeah. I I just mean demonstrate that you can like um that if someone tries to resist, they won't succeed, right? >> Yeah. Yeah. >> Um so so maybe that involves you killing people. Maybe that involves you like marching in and like like people are visibly too afraid to stop you and this is just a sign that like okay, apparently you can do what you want now. Um >> yeah there's there's there's a great kind of part of this book where it discusses how a common tactic for creating this this new shared understanding is to host a meeting with many with all the top brass of the military and say we are staging a coup of you have agreed and all of you are on board and then you just watch as no one opposes you because everyone thinks you know it's probably like kind of plausible and maybe some of them were on board others were like had said they might be on board and had been ambiguous but once you're all there and you say it and no one opposes you that already sends a quite quite a strong signal. Um and then you know often often you know that that that kind of meeting could be where that that essential shift in equilibrium actually happens. >> Okay. So I think I understand coups at this point. Um the next thing I want to know is how bad are coups? >> Yeah, it's a really interesting question. Um now coups are most common in in countries that are not robust democracies. >> Um in fact they're very rare in robust democracies. >> So a coup in the United States I think would be very very bad >> because we currently have you know a system of governance with with checks and balances and um democracy and I think we'd be losing a lot if we had a coup. when coups have happened historically, they're often starting from much less good governance systems and so they have been like less bad. >> Um, but still, you know, coups coups involve a small group of people just using hard power to force the rest of the country into submission and often they are extremely bad. >> Fair enough. >> Bad from a process perspective in terms of justice, but also bad for how how the country is governed thereafter. >> Gotcha. Okay. I've realized I want to go back into how to do a coup for a final bit. So, in some parts of your report, you mentioned that like you only need a small fraction of the military to be on board with a coup to succeed at least often. >> That seems crazy to me because if you have one/ird of the military and I have two/3 of the military, >> I would sort of naively think that I could beat you. like or I don't know maybe if you have the best one third of the military like that like if you have the one third with nukes like like like what's going on there like how like how much of the military do you really need to get on board in order to do this? >> So yeah, if you have twothirds of the military and there's there's strong common knowledge among those two/3s that they're all on your side, >> then I agree I can't do a coup with my one third cuz I'm outgunned. >> But if instead there's just the whole military which kind of thinks, yep, we're we're currently currently Daniel's in charge. Let's say you're the incumbent. Um, and then I come along and I get my onetenth of the military, storm all the buildings, um, threaten the key commanders of the military not not to say anything, and I create these credible signals that in fact a large part of the military >> supports me, >> even if that remaining two/3s actually backs you, >> if they don't know that they all feel that way and there's, you know, big big uncertainty among their ranks in how they feel and it just seems like, man, it seems like everyone's backing this new leader and that's what they're playing on the radio waves and none of them are are denying this. Um, then I can I can I can flip that equilibrium >> and I flipped it into an equilibrium that people are less happy with that that military personnel are less happy with. But because they're not able to all get together in a group and be like, do we like this new new leader or not? Um because that's that's dangerous because they don't want to be seen as opposing the new legitimate regime. Otherwise, you know, under this new equilibrium that that's not a good good thing for people to know about you. Um so it is because because of those dynamics um it it is possible for a minority to to to do to stage a coup. Gotcha. Okay. So now that we know coups are bad um especially if you do them with uh to the country like the US and now that we know like how we could do a coup if we really wanted to. Um let's go to AI enabled coups. So you've written this report on AI enabled coups. Uh my sense is that that's because you think they're like bad and like worrisome. >> Um and presumably you believe that they are both like somewhat plausible and like >> quite quite bad out of the space of things that could happen. Um probably we should first go on like why you think AI enabled coups might be a plausible thing that could happen. >> Yeah. Um so you know we can start with what where the historical evidence leaves leaves us which is mature democracies are pretty robust to military coups. Um they are not recently looking nearly as robust to executive coups. So there's been um democratic backsliding >> in um the best example recently might be Hungary >> um which has become increasingly autocratic through the kind of gradual removal of checks and balances. There's the example of Venezuela we discussed earlier um and know many com commentators think this is happening to a very large extent in the United States as well. So, you know, I think historically we can say military coups do seem very rare and executive coups seem like rare for sure, but like not not off the cards at all. And many people are worried about them even before bringing AI into the picture. >> Okay, >> now we can bring AI into the picture. And I think you know the first thing AI does is it makes executive coups seem a fair bit more plausible. Mhm. >> Um and two two main reasons it does that. The first is that um a kind of a group of um people in the executive that wanted to do an executive coup might be able to gain a lot of control over very powerful AI in a way that gives them a big strategic advantage over the other forces in society. And so, you know, the dynamics of executive coups as as they play out typically involve a power struggle between the executive trying to centralize control and their various supporters and the checks and balances um that were in the system already trying to oppose them and often you know there's there's a really lot of like push and pull and you know sometimes you know in in the case of Venezuela the head of state was you know literally put in jail for a bit um by their opponents then got out and then got reelected and then you know ended up you know you know really becoming an autocrat. Um so there's all this strategic maneuvering and so the first thing is that if the people trying to do the executive coup can get um a lot of control over powerful AI and can deny access to similarly powerful AI to to their opponents that could just you know give them a big strategic advantage in in that political maneuvering. >> The that's the first dynamic which I think makes it higher. The second dynamic is that today um people who are trying to do an executive coup or have done an have already centralized power need to rely on lots of other humans to like help them out and that constrains their actions in various ways. Um so normally it's hard for them to be completely brazenly power grabbing. they need to come up with um plausible ideologies and justifications that they can then get supporters to rally behind and support particular moves. >> Um but with AI systems that are sufficiently powerful, you can replace those those um those humans with AI systems. So, you know, rather than um having the um the policies of the government implemented by humans that like have some ethical standards and that don't at this point really want to um support, you know, really really awful um surveillance and that kind of have been brought in on a broad ideology. Um you can just replace them with AI systems that will just um follow the instructions of the header state um with with with far fewer quams. Um and so that can that can give that that um that head of state that's trying to do an executive an exe an additional um an additional kind of edge because they're less constrained by like having to work within this co this broad coalition involving lots of other humans. And then you know the most extreme example that we highlight quite a lot in the paper is indeed armed forces where like today in the United States this is very stark. You know the military personnel are very very opposed to um breaking the law. They're very much loyal to the Constitution. They strongly expect that all the other military personnel are going to do the same. And so it's it's a really really tall ask for the head of state in someone like the United States to get active help from um the military in staging an executive coup. And indeed, you know, the you know, Trump has come into some kind of frictions with the military when he's tried to get their help for, you know, um you know, deterring certain protests. But again, you know, this could be a really major shift as we increasingly automate the military with with very autonomous um weapon systems where again the thing we highlighted the most extreme case where you can fully replace, you know, a human soldier with um you know, a military robot. At that point, you know, under current law, it might be completely legal for um those those those robots to be programmed to just, you know, follow the instructions of the commander-in-chief, the head of state. >> And so we we'd move from this this current situation where like if you're trying to do an executive in the United States, you're not going to get much help from military personnel to this new state where like it's kind of up for grabs. how what are going to be the loyalties and the decision processes of this new automated military. And so this just introduces a big new vulnerability for really cementing an executive coup with hard power. >> Gotcha. So So I guess there's it seems like there's two key factors here. Um the like sort of two progress bars on AI capabilities that it seems like you want to keep track of. So the first is like >> roughly um how useful is AI for navigating strategic like maneuvering, right? Like you're like, "Oh, I'm in prison and like but like these people think this and these people think this." Like like to what degree does AI really help you in this situation? And then the other one is like how loyal can you have the AI be for you? And and I think like so in the report um you talk about like you want the loyalty ideally to be both singular to you and also secret, right? Other people don't know about the loyalty and and in addition like I think that seems like it's important um is just like that all of these things have these loyalties, right? Like like if you have a bunch of AIs but they're sort of like about as diverse as people are, it seems like this is probably harder to get off the ground. Um >> well, I push back a little bit. I suddenly them all the better. But if you could get say 10% that are loyal to you secretly and 90% that are just kind of if there's any chaotic constitutional crisis they defer to inaction then that that would be enough because you get your 10% you get your fair comple that 90% don't do anything to block it and and there you are. >> Fair enough. >> So yeah. Okay. That's a that's a very really good point. Um, so in terms of the like strategic maneuvering aspect, like how like like how much evidence do you think we have about or or I guess in either of these aspects, how much evidence do you think we have about like the you know how how useful AI is going to be? Like how how much of these relevant capabilities does it seem to have already? >> It's a good question. There have recently been a few studies that were pretty surprising to me on AI persuasion. Mhm. >> Um I unfortunately don't remember the details but I will just give my high level memory which is that there was one study that had AI posting on Reddit and then compared the number of upwards >> oh yeah >> to um to like human commentators who were also posting it was like persuade me of X and the AI would like do a background research about this person and what kind of their demographic was tailor make like a really emotional you know story about their own life that like really brought it out and like the results were like they were in something some crazy high percentile. I don't know if it was I think it might have been like the top percent or close to that for for persuasiveness and >> I I was quite surprised because um that that had not been a capability that I'd thought that we were kind of targeting um with with with current training techniques. >> Yeah. And one thing that's crazy, so so so this was done on the subreddit r/changemy view. >> Mhm. >> So it was research that was done. My understanding is that it will not end up being published basically because >> so so r/ange my uh view. I it has basically some rule that you're not allowed to like >> have an LLM pretend to be a human and like try to persuade a bunch of people of stuff like that. Like that's not okay according to them. Um >> crazy. >> Yeah. Yeah. So, so unfortunately I like we probably won't learn as much as we might like to about that study. But the other thing is so this was done at I believe the University of Zurich which I'm sure they have fine people but this is not like the world's leading AI lab or the world's leading like you know graduate program in AI. Um so so the fact that like uh these like you know obviously like competent but not top tier competent but not top tier AI institution can do it like maybe that like lends credence to like oh this is like going further than you might think. >> Yeah. And I think there's been one other study and I actually can't remember the details of this but again it it found the AI was was close to like top percentile humans in persuasiveness. Um, so yeah, that that's updated me towards thinking that AI might be very good at this kind of strategic maneuvering aspect because, you know, one one element of that is is is persuading people. I mean, and you know, historically often that's taken the form of persuading people of an ideology which serves your purposes. >> And you know that that seems like the kind of thing that that these studies are are looking at. You know, they are often studying. Okay, know here's political topic X. Can you can you can you like um shift shift my opinion on it? >> Fair enough. >> Um there's then another kind of relevant capability beyond persuasion which is something like strategic planning. >> Um which is essentially you know you're in a situation you want to achieve a goal you know what you know what what plan is best to achieve that strategic objective. Um, and that's something where, you know, it doesn't seem to me, you know, and it's really hard to predict these things, doesn't seem to me like the current training procedure is is really bringing that out. Like persuasion at least, you know, obviously trained on loads of conversations where it can see what's persuasive and what's not. It's not that surprising that it's generalized well there. For strategic planning, it feels more like it would need to have been trained in situations where there's there's there's a scenario and then an action is taken to try and achieve an objective and then there's some really complicated, you know, sociopolitical system and then it washes out and you see what happens. And like it's been trained on the internet which contains loads of history and you can probably extract that kind of stuff from history. But it seems more of a stretch to think that from pre-training it's going to generalize like pick out those lessons because it's so it's so much kind of less direct. So this is the kind of thing where you can imagine someone setting up some kind of fancy RL pipeline down the road where they try and extract all of the relevant um signal that is currently fairly implicit in you know internet data and you know craft and give it to AI and maybe also have AI you know try on various um kind of simp simple artificial environments and then maybe have AI actually try and you know, achieve things in the real world and um you know, learn from that. But um I would expect it to come a bit later um in in the kind of capabilities tree compared to things like coding um and maths where you can get a good automated feedback signal. >> Fair enough. So, so okay, that's a little bit on like why AI might be able to do coups and you know what what might go into that. I'll just quickly say that was all in the executive coup part. The other thing I was I was going to say ages ago is that I think there's this new risk of kind of corporate coup >> where because AI is going to be so powerful and it's currently being developed and controlled and deployed by private actors that there's going to be this new c you know by default there's going to be a big concentration of power in those private actors and I think that that will open up some routes to staging a coup. Now, this is necessarily more of a speculative idea because we just don't have um the same historical precedent here. >> Um you do have some kind of coups staged historically by um private companies. Normally, it's kind of very rich private United States private companies operating in um very poor countries. So, the Banana Republic is is kind of the famous go-to example where the this kind of fruit company >> arranged for um there there to be there to be a military coup which kind of helped with their own interest, but it's it's pretty rare. >> Um and so this this would be a new kind of risk, but um I you know I think the threat models here are are plausible enough to to be to be taken very seriously. >> So it seems like the threat model is something like okay, you have this like company It's making a thing that's like really dangerous, you know, or like or a thing that could be used to be really dangerous, a thing that could be used to like help you take control of power and the people that make the thing use it to take control of power. Um it seems like there's maybe an analogy in that um you know countries like buy weapon systems and the weapon systems are like really like like the US army would be like much much worse if they just had to use like rocks and stuff right >> um or you know or if they had to swim um so like >> you know like North Gman or BNS like that might be the wrong name but but these weapons manufacturers do Do we ever have instances of them being like, "Hey, we've got a ton of like fighter planes. Like, let's do a coup ourselves." >> I'm not aware of any. I think there's a few dynamics at play. One is that you need soldiers to use the weapons >> and those are trained by the military and they have this strong commitment to the rule of law. Another is that there's multiple different um military suppliers um >> right >> multiple different companies and so they would they would need to all be colluding um and AI does change both of those things. So on the on the weapon side I believe we're going to end up in a world of autonomous weapons and so you won't need those additional you know humans in order to to stage the coup. Um and so the companies will literally now be making all parts that that are necessary for for the military force. Um and the second is that that you know the there are dynamics that could point to a very strong market concentration in frontier AI i.e. maybe just one two or three companies that have most powerful AI systems. If they're the ones they're the if if those AIs are the ones that are making all the the the military weapon systems, you know, in the most extreme case if it's just one AGI project whose AIs are making all the military systems and there's now this single point of failure and that project is in an unprecedented position um in that sense. >> Fair enough. I I guess the other thing that seems maybe analogous is if a country like hires a mercenary force to supplement its military, but it seems hard like like for one like if the mercenaries are a small fraction of the military, maybe it's like harder for the mercenaries to create common knowledge that the non- mercenary militaries are on board with the coup, but like like are there examples of mercenary coups? >> Off the top of my head, I'm not aware of any where the Yeah. where it's kind of um where the mercenary is is is like >> oh there's the thing that happened in Russia with the um >> the guy what do you know >> the guy who marched on started marching towards Moscow >> yeah and then he gave up on it >> yeah yeah I think it didn't end well for him um >> yeah but but but that seems like uh >> that seems like almost an example >> military coups like executive cues and companyled coups um it seems like there's like there's some plausibility to the idea that like AI could like increase the ability of um of these situations. Um, I guess the next thing I want to ask is like so there's a there's a wide universe of like things people worry about um with you know scary things that advanced AI could do like how how high up on that list of scary things should AI enabled coups be? Yeah, my current view would be that in terms of importance, it should be maybe second behind AI takeover. >> Interesting. >> And if you then factor in neglectedness, then I think it's actually more important on the margin than than AI takeover. And I think it's more important than for example AI enabled bioatt attacks by terrorists which is another risk from AI that people have focused on and similarly you know kind of AI enabled misuse in terms of cyber I'd also put it as more important than that like do you think it's worse than AI enabled terrorism or more likely or >> disclaimer I haven't thought in depth about this comparison but >> fair enough >> it's easier for me to how air enabled coups would have a completely long-lasting effect. >> So, it's it's certainly possible that an AI enabled terrorist literally makes everyone go extinct, >> but full extinction is quite hard to get from um a bioweapon. um especially given that we'll be using AI to develop defenses as we go and there are many people who want to see everyone die and so we only need to stop those people getting access to these systems. Um and it seems like there there's it's not that hard to do that. Like I mean past a certain point it might be necessary to prevent open source. It also might never be necessary to prevent open source depending on how far ahead closed source is and how quickly we can get the defenses in place. Um and other kind of inputs that are needed to actually do a bio attack. Um whereas with the AI enabled coups, I think there are very many people who who want more power, >> many of those people will by default have a lot of control over AI and might might well be in a position to do this. >> And the kind of default dynamic of AI development, I think, is just going to really concentrate control of of of AI development and deployment in in the hands of a very small number of people. And so if I'm telling the story, I'm just like, well, look, people want power. There's they're gonna by default have loads of power and the opportunity to use it to gain more power, like it's kind of believable that they do it and they seize and they seize control and then once they do, they just hang on to control. It doesn't feel hard to tell a story where this lasts for a very long time. >> Whereas in the case of bio, it feels a little bit more difficult because we have to like not get the defenses in place despite the fact it's in all of our interests and all the powerful people want to do that. um we have to like actually share these these systems which you know we are testing for this risk. We will likely have evidence that there there's significant uplift. We have to make them so widely available that the tiny number of very low resource actors that want to do this are are able to. Um so yeah that that's that's kind of you know roughly where I am in terms of putting it putting as high priority. Uh >> yeah I think that makes sense. Um maybe it makes sense to talk a little bit about the scenarios of like yeah what I don't know types of AI enabled cues and like stuff you could do to prevent them. So um so I guess there's at a high level there's um you've got your uh corporate AIQs, you've got your like executive AIQs and you've got your military AIQs. Um yeah, which one are you like most excited to talk about first? Let's start with the executive. >> If I imagine what this looks like, should I basically be like, okay, you've got an executive. The executive somehow like gets a significant amount of control over AI development. Um the executive like so so in the executive queue like the executive is just using the AI to like persuade people and or figure out strategy in order to like allow the executive to get like gradually more and more power. Um >> and the other thing they're doing is that they're deploying AI throughout society especially in the government and military AI is is more loyal to them. >> So so throughout the society and military to make it more loyal to them. I guess part of what the executive is doing is like trying to stop other AI enabled cues >> potentially. If you know if if if those are seen plausible, >> you know, if there's a risk that the there's going to be a corporate coup, the executive would want to stop that. >> But I I haven't been thinking that as a primary thing that they'll need to do. The primary thing, I think, is to to centralize power themselves. Um >> Right. Right. So, so, so less to like prevent coups and more just to like prevent independent other entities like wielding any power. >> Yeah. So, so, so you have these like three risk factors, right? Um, like singular loyalty, secret loyalty, and exclusive access, right? >> Yep. Um, and so it seems like part of this story is like the executive uses the AI to like do a bunch of stuff. Um, like like do a bunch of tricky stuff and like other people can't stop the executive and it seems like this is like mostly or other people don't can't figure out how to stop them. you know, this seems like it's largely leaning on exclusive access and the bit where the executive like um has everyone else use like AI that the executive likes seems like this is maybe leaning more on like singular loyalty and to some degree secret loyalty. Is that roughly right? >> That that's exactly the mapping. And normally with the executive, I'm not imagining secret loyalties, although it's possible >> because the executive has so much political power to begin with. They could just be like, it's it's completely appropriate for these AI systems to to to be loyal to me. They they could do the more fancy thing of >> um secret loyalties, but there's there's a technical hurdle there, and it just might not be necessary for them. >> So So it seems like these are sort of two routes or or two I don't know if route two routes is the right term, but like two >> Yeah. two things that the executive is doing with the AI. I'm wondering like do you need both of them or if if you're an executive trying to do a coup, could you like survive with just one of these? >> I think you you can do with just the singular royalties in AI deployed throughout society >> version. So, you know, the the story would be there's heightened tensions between the US and China. um we're rushing out to deploy AI to the military and >> we just >> we being the US the US is is doing that and and the head of state is saying obviously military AI follows the commands of commander-in-chief. >> Um that's how it should be. >> That's how you know that's how the command structure works. Um we we've never had, you know, autonomous drones check for whether things illegal before they do follow their their instructions in the past. They just do what they're told. That that default continues. Um and then people will very likely oppose and say, "This is crazy. Wait a minute. Can you just stage a coup?" >> Um but the head of state has their supporters and has a lot of power and has already set a precedent of like really nailing people who push back against them. And so they succeed in pushing us through and they never had access to any kind of super genius strategy AI because the strategy was just quite obvious like well yeah if we just get all the military robots loyal to me obviously I'm now can do whatever I want and so I do think that that that that second path can work by itself. Yeah, it seems like it's and one concerning thing there is so when you say like oh you know like like the AIS are being loyal to the president and they're you know not checking like other laws and stuff. I think it's not a crazy argument that that is like legally how it should work. Like definitely like the president is literally the commander-in-chief as you know like there's a prominent legal theory called the unitary executive theory that like yeah the president just in his own person like has unitary like control over the executive branches of government. Um like oh I guess I don't know if the military is executive but but like yeah I think you know I think like the design of the constitution is very much intended to separate and limit the president's degree of control over the military. like it's it's very clear the military is is is loyal to the constitution. So I think if you were to like take the spirit of the constitution and apply it to a robot army, >> it would be clear that you shouldn't just have the robot army doing whatever the president said without checks and balances. I think though >> that is not how the constitution was designed. It didn't have, you know, caveats for what if we develop a robot army. So, as it is currently designed, you know, reading it line by line, >> yeah, >> I don't I'm not I don't I don't believe I cannot be confident that it would rule out this this L robot army as illegal. >> Fair enough. >> And so, I think I think you're right that you could make legal arguments that this is at least legitimate and you could you Yeah. You you could you could claim it's appropriate given the commander-in-chief, although I I do think you you'd be on shaky ground um gi given the clearer intention of the constitution. Yeah. Um yeah, I guess I I mean maybe one thing going on is just like my understanding is that American juristprudence um especially at the Supreme Court level is like very much um lean towards like what does the text say rather than what do we believe the intention of the text was um which like plausibly heightens this risk in this domain. >> Yeah. So, so, so going back to this story. So, so let's say they only have the like loyal AIS in the military part. >> The president like gets all these like military drones or whatever loyal to the president and then is the story the president then kind of does a military coup of like if any police officers try to stop any of my supporters doing like random violence or whatever like the military drone will shoot the police. Like is that roughly it? probably it's going to be in the resin's interests to not show more force than they need to because it's going to be useful for them to have everyone continuing to support their their leadership and think it's legitimate. So they probably what they do is they kind of increasingly ignore checks and balances on their power, >> right? And then increasingly it becomes clear that like nothing is going to stop this situation >> because at the end of the day if if the protesters come this time you know President Karen just order the drones to go and you know clear out that protest probably not not going to shoot everyone but you know going to kind of make them go home and you wouldn't have been able to do that before the robot army and then increasingly the present is just doing doing what he wants, ignoring the checks and balances, um, integrating AI to replace all of the humans that aren't doing exactly what he says he wants them to do. Um, and you know, if if anyone ever tries to, you know, really um, you know, refu refuse to go along, then then at that point he just um, fires them and has them put in jail or something. Um, and that that's kind of, you know, a show of strength. And then as as no one is able to oppose this because ultimately the hard powers in the president's hands, it just becomes increasingly clear um who's in charge. >> Sure. So if I think about this like broad scenario, one thing that's kind of interesting to me, so a background thing I'm thinking about when I'm reading this report is like the relationship between um AI enabled cues and um AI alignment or misalignment risk, right? >> Yeah. And so if I if I imagine this like somewhat minimal version of the um executive coup where basically the way it works is that you just have a bunch of military stuff and it's like powered by AIs and the AIs are or at least 10% of them or whatever are like loyal to the president. Like the AI technology that enables that is just like alignment, right? Like like the thing like like getting an AI to do what like a person wants. That's like the problem that we call alignment that we're like all hoping to solve. >> Um, >> yeah. >> So, in some ways this seems like like some of the paths I think are like like alignment research really would um prevent them or make them like a little bit more tricky. But this is interesting because it seems like a one that like really is like cutting against a lot of like technical alignment work or or cutting against is maybe the wrong word but like not prevented by technical alignment work. I'm I'm wondering like if you have thoughts about that. >> I think that if the executive if the president knows that AI is misaligned then it's not he's not going to be wise to give it control of the robot army. >> Yeah. If the president believes that the AI is aligned, but in fact it's secretly misaligned, then the president might well give it control of the robot army, align it, think he's aligning it to be loyal to him and then stage a coup and then you know he he will you know he will be laying the groundwork for AI takeover. >> Yeah. But you know in fact you know the threat model of him staging the coup goes through even though he hadn't solved um the alignment problem and you know my understanding is that people are mostly worried about this exact scenario where AI seems aligned but it's not. So I think basically the threat model still goes through in that scenario. The thing the difference that that doing more technical alignment research does is it means that rather than the president maintaining control of you know the world indefinitely or the country indefinitely after the coup. If you if you if you fail to solve technical alignment then in fact the president is going to be replaced by mislined AIs which you may prefer or disrefer depending on you know various philosophical considerations >> but I I wouldn't particularly say that if you solve alignment then you're making this threat model a lot higher >> because well except to the extent that it's then common knowledge that you've solved it I think like >> yeah if you yeah if it was going to be widely known that it's not solved then I agree yes you you are increasing this risk So, so that's how you could do it like if you just had control um of or or if you only did it via um having like singular loyalty kind of throughout like the military, you know, just like one half um you know asking about the two halves, right? Like exclusive access to do really good planning and like loyalty of AIS like distributed throughout. If you just had exclusive access um like how do you think you would be able to do an executive queue just via that path? I think it's a lot less clear. I think the main thing you would do with exclusive access is the most obvious thing you would do is then try and get convert this first path to that second path. So you use your exclusive access to get, you know, AI strategy advice and AI technical analysis about how you could get loyal AI systems deployed throughout um critical systems. And so that might they might advise you to do secret loyalties. they might advise you on uh you know political a particular political strategy for like pushing through the more overtly loyal AI systems. Um I think that's the most obvious route >> if you if you're like could you use exclusive access to staging a coup without going via this other um kind of singly loyal AI um approach. I don't know how important this question is, but I think it's basically unclear. If you buy into the more sci-fiesque claims about what super intelligent AI will be able to achieve, then yes, you could do this because what you could do is you could set up um a group of automated factories um somewhere maybe as you know as part of a kind of um milit military R&D project that you you managed to kind of push through and then you just quickly make very powerful fully automated weapons, you know, nano bots or just you know amazing drones and then even though they weren't really ever integrated into the the official military they just then straight out stage a coup right >> so you you can you then stage a coup without having to integrate AI in any kind of you know formal institution um but it leans much more heavily on you know what you can get through you know super genius AIS and then you like a relatively small amount of physical infrastructure >> yeah I think that Um or yeah, there's this interesting thing that's that's going through the back of my mind as I read this. So like in general when someone is like, "Oh yeah, I'm worried that like in the future we'll have more powerful AI and the powerful AI is going to mean that people can do a bad thing." Like I I think a natural question to ask is like well why don't other people use the powerful AI to stop you from doing the bad thing? Um, and so for the for the first path of like executive coup where um, you know, the president gets all the like military AIs to be singularly loyal to him, presumably that's him or her. Presumably that's like the reason other people don't use AI to stop that is because like this is like at least arguably legal and arguably legitimate and like you know at some point you'd be doing the coup if you like resisted. Um, and I guess exclusive access is another story where like people don't stop you because like they just don't have as good AI um, compared to you. >> Yeah. >> Yeah. I don't know if Yeah, I guess that's more of a comment than a question. Um, unfortunately. Yeah. I mean, I agree. I think it's clear with the asymmetry comes with exclusive access and then with singular loies, the asymmetry is you and not everyone else is deciding, you know, the the behavioral dispositions of these AIs deployed throughout society. And so it's you're kind of leveraging your political power to push through this kind of asymmetric AI AI loyalty and broadly deployed systems. I guess like part of the asymmetry here is like if you if you don't have exclusive access then presumably if people are willing to break the law like they can do some amount of just preventing you from having uh exclusive loyalty by like you know using their like subbing in their AIS or like using their AIS to help them figure out how to like stop you make it appear that things have exclusive loyalty to you but like they don't actually. >> I'm not sure. So you know again going back to the military case >> you could have these this the the hardware of these military robots you could then be like I'm deploying this AI software which is loyal to me no one else can then go and like actually deploy their own AIS on some of those military robots because it's just infrastructure that the government controls >> and similarly you know you could imagine fully automating um some kind of implementation body of government which has some formal authorities and now no one us can again sub in their own AI because the AI could could do analysis, could make recommendations, they wouldn't have the formal legal authorities to like take actions within with within the political system. And so again, they wouldn't be able to sub it in. I think if you're talking about human employees still working Yeah. >> within those um organizations and using AI assistance, then it's more like okay, they could sub it in. >> Yeah. And I guess it's even tricky just because like the president just inherently has a wider scope to do this. So, one thing I think I'm imagining is like there's like the president, there's like I don't know a few branches of the military. The military has various admirals and then under the admirals or whatever, there's like a bunch of like robo soldiers. And I guess I could maybe imagine like, okay, one of these admirals is like convinces gets their robo soldiers to be loyal to the admiral and not to the president. But like you sort of need a bunch of the admirals to do that. Like going to the earlier point of you know creating like you know creating common knowledge. Um, >> yeah, I imagine it would be more people who are involved in the procurement process and the technical side of that process of of setting up the eye software than people who are kind of charged with making the strategic analysis in real time. >> Um, that that might be in a position to do that. But yeah, I agree that there could be people in the military who could do that. >> So, okay, we've talked about um the the paths that an executive could use to do a coup. Um, I think I want to pivot to um the paths that uh an AI company could use to do a coup because at first blush it seems like both of the things you said could kind of work for the AI company, right? Like if the AI company can get all the railway soldiers to be loyal to the AI company. Um, and if the AI company can have exclusive access to its own AIS, like >> that seems like maybe it's putting the AI company in a pretty good place, right? >> Yes. Yes, I think for exclusive access it's easier for the AI company. They are just going to have it by default. Yeah, for the head of state or the executive branch to get exclusive access, they would have to intervene in quite a substantial way, >> right? >> Um with with the development process. Um I think the thing that's harder um for the air companies that in terms of deploying these loyal AI systems throughout society, there's a much higher technical hurdle. >> Yeah. where they have to make them secret um really hard to detect. You know, like let's say they they train GP um 7 to be secretly loyal. >> Y >> um it could be, you know, 5 years later that someone discovers a new a new testing procedure and if if those weights have been stored somewhere, >> um then someone realizes, whoa, like open a you know, trained a literal secret loyalty and then the game's up. So, it's like, you know, potentially like you you really have you you know, you probably want to really cover your tracks um and you know, lock down those weights. Make sure no one can ever run any tests on them that that you don't want to you don't want to run. Um and if if you know, today AI AIS are not that reliable that, you know, their personalities are somewhat haphazard. It's all a bit of a mess. Today, I think it would be very hard to get away with a very hard to detect secret loyalty. And it might just be that it's hard to predict how hard it will be to detect these secret loyalties. So it might be that the company is considering this. It's like we could put in this really subtle back door, but we don't know like what people are going to do on this API testing. We don't know if some of our employees might might test for this in certain ways that you know some of our employees will have access to you know deeper analysis tools. Um there's going to be records of the training data like are we going to be able to delete all of those and if not what if someone goes looking? It could be it there might just be genuine uncertainty about about whether this is possible until quite late in the day. You know I imagine that at some point once you got you know really super intelligent AI systems they they would be able to you know come up come up with a plan here but it might be that you know for most of the intelligence explosion the analysis is just like ah seems like you might get caught. Um whereas for for the executive coup you know the president is doing everything in plain sight. there's no risk of like getting caught because right they are claiming that they have legitimate reason to do all these things as they go and they're defeating their opponents as they go. Um and so there's kind of there's less downside risk. Um so that so so that that that would be like the counter argument. You know I think there there are things on both sides of the ledger in terms of which is more likely but the counterargument is just that the technical hurdle is much higher. >> Yeah. So maybe this is actually um so so I'm trying to give myself some hopeium to like stop myself from worrying about this and maybe one path I have is like okay >> AI enabled coups it seems like it's much easier to do it if you're the AI company because you're you know you have the you have all the AI but you have to have this hurdle where the the loyalty of the AI kind of has to be secret right >> it seems to me that the worry about that is like suppose you're an AI company and you manage to like you succeed in instilling the secret loyalty to yourself. >> Mhm. >> I think that should make you rationally worried that well like if an AI can do like secret loyalty, like having secret goals that it like pursues, this is like very basically just the same thing as far as I can tell as deceptive misalignment where an AI is like pretending to be aligned to you, but it's actually not aligned to you. It actually just wants to do whatever it wants. And so I would think that like I would think that if an AI company succeeds at getting a secretly loyal AI, they would rationally be concerned that the that the AI that they think is secretly loyal to it is actually like deceptively misaligned. That if they use this AI to get a bunch of power, the AI is actually just going to take over and like, you know, it's not going to be the AI corp that rules everything. It's going to be like some random AI desires. Yeah. >> What do you think of my opium? >> Uh yeah, I'm not convinced. I think that you are right that the AI company would would know that the AI has the capability >> to do deceptive misalignment but the AI company might still have strong reason to think that it will not have the propensity. It might have done you know careful you know interpretability throughout training or whatever alignment techniques you you believe might work. It might have you know various empirical evidence to suggest that um it is in fact not um misaligned even if it would be capable of doing that and in fact it it might not even have evidence of the capability. If the company is really just like you know being you know really just going for this then it could have very sophisticated internal tests for um scheming >> which in fact the AI cannot game >> um and it could just and then it could kind of train an AI to be secret loyalty but then that secretly loyal AI is only able essentially to game tests that it's been trained on how to game. So it can gain all the company's test. It's been trained on that. You know, that's not scary. Obviously, you can game a test if you've been told how to pass it. It can gain, >> you know, the unseen tests of auditors outside of the company because they're much less sophisticated. Um, but the company could still believe that the AI is actually not capable enough to game a highly sophisticated test that it has not been taught how to pass. And so actually even on the capability side >> um it wouldn't really have evidence that the scheming capability it needn't have evidence that the schema capability is there. >> Yeah. I guess there's also this um Yeah. So one thing is it could like uh be secret like to people other than the lab but not the or other than the AI company but not the AI company. You could also have like like you might think that like if the like the first AI that you can succeed at making secret really loyal via doing a bunch of like special build training. I guess it doesn't seem it seems somewhat reasonable to think that uh if you um that that that like a thing that you could that the first thing you can make secretly loyal by doing a bunch of like special build training maybe couldn't be secretly loyal by like just the normal course of pre-training. >> Yeah, I think that's very likely. You know, I think that the risk of secret loyalties, I think, is very likely to emerge before the risk of scheming because it just seems way harder to zero how to pass all these tests and know what your strategic situation is and exactly when to to act out versus if you just have, you know, a million super smart AIs figuring all of this out, training you how to do it, doing all this analysis. It's just like, you know, the the difference between what I can figure out in in in a few seconds on the fly and what, you know, a large civilization doing a concerted research effort to figure out, you know, orders of magnitude difference in how how much you can achieve. Um, and so I, you know, I do think like the secret loyalties thing I strongly expect to become technologically feasible at an earlier stage. >> I think it's still Okay, here here's why I'm holding on to myopium. I think there's this like risk aversion. I think it depends a little bit how risk averse you are, right? So suppose you're like, okay, I managed to instill a secret loyalty. Uh you know, let's say like I think I'm like 95% confident that the AI doesn't have its own secret loyalty. Like one in 20 like if you're if you're the head of an AI lab, >> you have like I think they have pretty decent lives. Like I don't know. I've never like chatted with uh Samman or Deisabus or >> Yeah. Um, like I from the outside it seems like they have like relatively cushy lives, right? Like one in 20 chance of like you hasten AI doom by starting a coup. That's like pretty bad, right? So So it seems like you don't it has to not only be true >> that the AI doesn't have secret loyalty. You have to be like pretty confident in it. >> Well, let's say that you know OpenAI trained GP7. It did the capabilities test. It did the kind of alignment, you know, test that has and it was like, we're going to deploy it. We're happy with this system. They've got a certain level of of evidence and like, yeah, let's say it's really capable. It's really good at strategic deception, but, you know, indeed, people in this community worry that they would decide to deploy nonetheless. Maybe the risk is 5%, maybe they think it's, you know, 5%. >> The question is, if they're now considering instilling a secret loyalty, is that going to like significantly materially increase that risk? And I think it's not actually something I thought about. You could argue, well, look, you're going to be actively teaching it all these different types of um strategic deception. That seems like maybe it's increasing this risk, but what the reason why I'm not sold is that I don't see why you'd be actively teaching it to in fact be misaligned. You're obviously giving it capabilities which is scary, but if you've already decided how like you think it is to be misaligned, you've already decided you're happy deploying it, like are you going to now be more worried about it suddenly becoming misaligned as you trade it to be loyal to you? Like that that does that doesn't seem seem seem like it would be the case. >> Yeah. I I think what I'm imagining, which maybe doesn't actually make sense, what I'm imagining is you have an overall plan um and your overall plan has two parts, right? Like part one is like instill these secret loyalties and part two is like have the AI be more widespread and have more ability to gain power then you like by default were planning right the combination of that is like pretty bad. Now if you if you're already going to like >> you know if you were holding fixed how far you would spread the power of the the AI or whatever. Yeah. >> Then I agree instilling your own secret loyalties I think it I think it provides some Beijian evidence. Yeah. I think maybe being it seems plausible to me that being able to do it is like some evidence that like it might have already had the secret loyalty, but >> yeah. >> Yeah, I think it's less bad than like the the two-part plan. So, >> yeah. Yeah. And I I think it's a good point. a handful of this that if someone did want to stage a coup using their AI system, they probably would want to push it out into the world faster, you know, faster than other AI companies because, you know, if if half of the military is now controlled by this other AI company, then it's becomes much less clear you can pull off the coup. So, I think it's a good point that to the extent that you are already worried about misalignment. >> Yeah. um and that and therefore you wouldn't have wanted to push out your AI to the military that that would then um you know continue to stop you from pushing out your AI to the military to stage the coup >> and that that provides some hope but you know my baseline has actually just been we're already worried that the AI companies will just be you know pushing out the AIS all over all over the place because they won't be that worried about misalignment and then once we take that for granted then um then then then the risk of of staging a coup is is very real. I guess where I would agree with what you're saying is that to the extent that these companies are worried about misalignment, I do think that that helps because it will it will give them pause. >> Yeah. Yeah. I I think it's a good chance that like I think it's a good point that if there's only like uh yeah, if you're already deploying and maybe you're like if I'm planning to do a coup, I'm going to deploy 20% more, that's probably not going to be the difference between I think that's a pretty decent point. Mhm. >> I think in the report the illustration and the way we've talked so far is like we're sort of imagining it's like the head of the AI lab that's doing this. >> One thing I kind of wonder about so friend of the podcast Yan Micah is head of alignment at you know anthropic. Um seems like a chill guy. I don't want to cast any aspersions on any of the heads of alignment of any of these labs but wouldn't they be in a really really great position to do this? >> No. But I'd normally thought that it was the people who were involved in post training >> and were involved in let's say the fine-tuning runs that give the AI its particular personality and goals >> that would be best placed um you know the team at OpenAI that's you know ensuring the AI matches the model spec because they're the ones that are kind of implementing the particular values and they could maybe try and you know insert an extra stage at the end that you know that isn't reported um that that inserts the secret loyalties and then I was imagining the alignment people would be the ones that then do the tests for the kind of accidental misalignment. But I mean it I think really it's both that you'd want the ideal coalition would in include both because you have the team instilling the secret loyalties colluding with the team that's meant to be detecting them. But you don't necessarily need the detection team because if the team that's instilling them can access the code of the team of the alignment team then they can teach the AI to pass the test. And so, you know, this is this is another reason to have siloing within um AI organizations so that so that you can't teach it to pass the test in that way. >> Yeah, I'm now realizing that I know less about uh the the internal the exact internal structure of these labs. Um so, yeah, neither. Good thing I'm not good thing I'm not trying to make a queue happen. Um >> but uh yeah, that's a good point. And again, uh, no individual dispersions to anyone I might know who, uh, works works in these teams. Um, I I feel decently happy with how we've covered, um, corporate cues. Um, do is there more you want to say there or shall we move on? Another technical hurdle I'll just flag is that I don't expect GP7 to be given control directly of all the robots and all the drones. um I expect there to be specialized AI systems which are developed that control um those particular you know military weapons >> and so there's this other technical question over even if you had all GP7 secretly loyal to some some group of people there's then this big question of like they have to now do all this military R&D design all these new you know weapons and drones and you know build the AIS the specialized AI systems that will control those drones and weapons and they need to do so in a way that ensures that this small group of people ultimately has control over what this whole very complicated interrelated military infrastructure is is going to be. You know, it's not just going to be carbon copies of the same robot just, you know, it's not going to just be one AI system. There's probably going to be it's going to be very complicated hundreds of different AI systems controlling hundreds of different, you know, bit bits of equipment interact complicated ways. um you know probably we're not literally going to have robot soldiers that replace one for one human soldiers. That's you know that's just a simplification to to to explain explain how how how intense we're imagining this dynamic being. And so there's an open question over okay as you get this kind of ruling military um kind of um the these military systems interacting in complicated ways controlled by various AIs and and other things. How easy is it for GP7 or GP8 to like ensure as they're building all of this stuff that it's all going to be ultimately controllable and loyal or h or hackable by the you know that the AI company or or or by copies of themselves where like you know I I just don't know but it seems like it might be really hard um you know >> the the military contractors might have humans or less powerful AI reviewing these designs and noticing like obvious flaws. you know, the military isn't stupid. You know, they they have serious security practices. They're worried about China, Chinese attempts um to um to kind of seize control. They might be worried that they're indeed Chinese spies in the labs that there might be having influence on these AI companies. So you might well expect there to be you know significant defense efforts from the military fingers crossed um you know looking looking for this kind of thing and maybe just you know really tough technical task to design a very complicated physical infrastructure that is ultimately all controllable by you know a particular AI system or particular copies of an AI system and so just like I think it might be very hard to predict in advance whether that works um and that's another another significant technical hurdle that might just turn out not to be doable um which I think you know should should should give us pause um for concern um rather for hope in terms of whether the the company coup is doable. So there's a few paths towards an AI enabled coupe that um we've talked about. There's like basically the head of the executive do it doing it. There's the lab company doing it. And also there's this free variable about like you know there are variety of countries that could be cued. >> I'm wondering um yeah I'm wondering if you have a sense of the the relative like likelihoods of these things happening. >> It's a great question. Um, in terms of countries, I think that in in the in the fullness of time, current countries that already fairly autocratic, um, you know, like China and Russia, I think, are at very large risk of an executive coup because the executive is just starting in such a strong position to begin with. So you know all of those steps they basically accomplish you know the first half or more >> and then you know it's quite plausible they could use their existing power to push through the deployment of loyal systems throughout society. Um so I think that is that is worryingly likely honestly I it sometimes feels a bit hopeless to me in terms of how we avoid that. >> Um you know you can imagine one country really intervening in another country's affairs. that's not something I like really feel excited about pushing towards. Um the other thing is, you know, is really kind of encouraging the other actors that still have some power in those societies to really be live to these issues and get ahead of the game and maybe they can kind of outmaneuver the the head of state even though the head of state is in a very strong position. So to the degree that part of the reason you're worried about AI enabled coups is that you think that there's like some concentration of AI labs or you know small number of labs um that are powerful. I mean presumably one way of preventing this is like uh if so suppose like you and the AI lab are sympatico and suppose you have a list of like here are the countries that I'm most worried about having a coup. You could say like, "Hey lab, we're just like not selling to those countries." >> Um, which is obviously like >> it's a somewhat geopolitically aggressive move, I guess. >> You might also be able to sell AIS that have guardrails that prevent their use to enable an executive cure. It would be very complicated because, you know, if you're just setting up a surveillance state, there's just lots of, you know, some narrative defined tasks that you want your AIS to do. You could you could you could try and you know differentially allow them to deploy AI systems that won't centralize power as an intermediate. >> Yeah. I I guess the tricky thing about that thing is it's just very like like if you have some countries who do get to use like AI in their militaries and some countries where like either they don't or the AI they get to use is like you know filtered for like not doing a coup and like like maybe other countries don't trust that that's like the only thing they've like monkeyed with. like it seems like uh it seems like it might be a pretty aggressive move which uh >> I don't know how aggressive it's going to be to just not sell a powerful technology. I think that might be the default situation with with a really powerful AI that just for kind of national security reasons you wouldn't want countries that that you're adversarial towards to have access to those those most powerful systems. Fair enough. But to me the worry is it's just a delaying tactic and that yeah in the fullness of time there will be you know China will develop its own powerful AI and and and sell access to autocracies that want it. >> So maybe another question is so I'm not from China. I don't live there. Um I wish the best for the Chinese people. Um, if there's a but if there's a coup in China, an AI enabled coup in China, to what degree is the concern like China is autocratic forever? >> And just to be clear, you know, probably in China would be less called a coup and more well, it would be an executive coup, but it it might just be cementing the system that already exists if you already consider it to be autocratic. >> Also, by the way, I'm asking about China, but I'm I'm not really just specific to China. I'm mostly just thinking a bunch of countries that I don't live in. Um if if there's like a relatively autocratic country, it does it has an AI enabled coupementation of power. >> To what degree is that concerning because that country is like autocratic forever versus to what degree is that concerning because like you know maybe that country like becomes more bellose and like starts trying to take over the world or it's like a promoter of conflict. >> Yeah, I think it depends on exactly what you care about. One one lens you can take is the kind of hard-nosed long-termist lens where you say, "Okay, what we care about is control of the stars over the long term." And so then you'll be kind of thinking, okay, >> would this would this, you know, perhaps less powerful country? Would would the new dictator hang on for power for long enough um for it to be indefinite? And would they be able to get a sizable fraction of of the stars such that there's been a significant loss of value? If it's if it's a not very powerful country, you might from that really like hard-nosed longtime perspective say, well, it's not going to be powerful enough to actually gain any of the stars, probably the United States is just going to, you know, basically be carving up the stars with China or just taking them all from themselves. So, you know, though it's it's a tragedy in terms of the people who live in those countries, you know, from the kind of brutal utilitarian calculus, um, >> it it it matters a lot less. I mean, that that's one lens. Um then you know the other lens would just be be the humanitarian lens that says you know that this is awful for the people in that country and also if that country is able to kind of strike a a deal with with countries like the United States then they might be able to embed themselves permanently even if ultimately United States has much of the hard power. >> Yeah. I I think there's there's this uncertainty I still have about the domestic versus the international impact of doing a coup. >> Yeah. So I'm not sure. So I could imagine one story where like if you do especially an AI enabled coup, you get like all the military like really unified behind you. >> Maybe that just makes your military more effective because they all have one purpose. Um you're you know you you have like access to this like really good planning >> and if you compare to militaries that like you know basically haven't been involved in a coup that are you know like different people with slightly different desires and they're you know they're not as ruthless. You know there's one story where like that military is at a significant advantage. You can also have a story which is like well democracy seems like it's generally good like you know somewhat dispersion of power seems like it generally like makes things run better so maybe this is not a concern. >> I'm I'm wondering if you have thoughts there. >> Yeah. One one related thought I have is that let's say there's not a coup in the United States. I then personally think it's unlikely that the United States would end up completely dominating the rest of the world and kind of seizing all power economically and all strategic control for its own citizens to the exclusion of all others because >> the United States a few reasons. Firstly, United States has many different coalitions with power and many of those coalitions have ideologies that make them committed to um things like democracy, things like trade, um and have kind of positive positive views of of other countries like like say the United Kingdom where I live and they just wouldn't want the United States to kind of dominate the United Kingdom as much as it possibly could. And so that kind of that balance of power within the United States would ex would ensure that the United States uses its power in a way which like does go you know somewhat beyond its borders. Um and the other thing is just that if the United States wanted to completely dominate the rest of the world probably what it would want to do is to like really restrict the AI systems that it sells to the rest of the world and kind of really drive sell those system access to those systems at the highest price it could. Whereas under the default situation where power is distributed within the United States, different companies within the United States will compete to sell AI services to the rest of the world, driving down the cost that the rest of the world is paying. And so under like kind of because of competition within the United States, that that means that actually the United States is going to give the rest of the world a bit of a better deal. And so, you know, under under this default scenario where power is distributed, I think there's less prospect for the United States to really like just take power for itself. Um, even if it's leading on AI, whereas if there is an AI enabled coup and one person becomes dictator with total power, then they might be like, I want to dominate the world. I want all I want all control and I'm just going to force all these companies to you know only sell at this this like extortionate rate and you know the rest of the world has no other source of powerful AI so they'll pay it and then I'm going to you know choose our foreign policy and economic policy to only take into account the welfare and power of the United States in particular and so I do think that if there's an air enabled coup then that could in a particular country then that as you indicated that country might become more bellose um at pursuing his own particular interest and could actually do so >> more effectively. >> And and I guess there's there's also just this factor of like if you're doing a coup, you're probably a bit of a bellose person, you know, like like you're probably like more inclined to that sort of thing than other people. >> Exactly. Exactly. I mean, you raised a good question about are democracies just going to be more efficient? >> Um because you know, the free market's fairly efficient. you're kind of distributing the decision- making. >> I think you know a scary possibility is that you can still gain the benefits of the free market by distributing all the economic decision- making and and having you know markets operating within the country but you still have on all the important decision points AIS that are loyal to to one person and so you can kind of get all those kind of economic benefits of democracy now without actually needing to have a real democracy but I haven't thought much about whether that would go through. >> That's something to think about. And so speaking of democracy and speaking of the United States, so so so initially you said like yeah probably like countries that already have a very strong executive um that that already are less democratic probably are more at risk to having a stronger executive and being even less democratic. Um I live in the United States. I'm a fan of it. How worry like what do you think the like how high do you think the risk is that the United States gets AI enabled coup? I mean, if I had to pluck a number, I' i'd say like 10%, but it's it's very made up. Um, that that's my my rough probability for AI takeover as well. Like I I think it's ballpark similar. >> Okay. And yeah, how can you talk me through like why is it as high as low as 10%. >> By analogy with AI takeover or just um >> in and of itself? >> In and of itself. >> Yeah. So, I think some things are fairly likely to happen. um >> we're know we're likely to see a very small number of companies um developing super intelligent AI systems. M >> we're likely to have a government that if it tried to could gain a lot of control over how those capabilities are used um you know via its kind of default monopoly enforce um it's that that sec apparatus >> um if they don't then by default power is is already and will continue to be very concentrated within the AI companies you there are not in practice many effective checks and balances on the CEOs um in in these companies. Um I also believe that it's quite likely that that that that CEOs will want on the margin to just increase their own power and use their influence over AI to increase their influence more generally. So you can already see with Grock, Elon is doing this in a totally shameless way. it's getting, >> you know, Alton Grock's prompts to make it kind of promote political views that that he likes. Um, >> and I think it's just, you know, a natural urge if you want stuff and, you know, you want a bit more more power and you just have this this way of getting it, which is that you're controlling these hugely powerful influential AI systems. So I do think it's quite likely that on the margin that that these these company leaders will will walk down that path of increasing their own power to some extent. >> But there are also some things which I think are not particularly likely. They may happen but it's like will will at any point a kind of a key you know company executive decide to do something which is you know really egregious like at some point they might need to decide to do a secret loyalty. I think there's a chance that that's just a step too far. Um or there's a chance that by the time that's possible, the world has woken up and just put in some kind of checks and balances that would make that um hard to do. >> Um and then there's the fur the technical question on like, okay, but would this actually work out? You know, we're pointing to some of these difficulties of actually getting these secret loyalties propagated to the milit military infrastructure. Um being really confident the AI isn't actually secretly misaligned. Um, so you know, really zooming out, maybe there's like, you know, a couple of two two or three couple of steps which are like, uh, you know, I wouldn't say it's more than 50%. Um, >> and so, you know, that that gets you down. Let's say there's two steps which are 40% each. Um, or, you know, we're just in this rough range where it's like, you know, it's it's about, you know, it's about 10%. like, you know, as I'm thinking this through, I'm thinking maybe maybe it should be higher because you've got either the the lab route or you've got the um the kind of executive route and maybe you actually just want to add those up. >> Um >> yeah, that that's just a brief indication. >> Yeah. So, okay. I think at this point I'm interested in just talking about like how maybe what people should do about this. >> And probably I'm going to be most interested in thinking about this from like a US perspective because like that's where I live and what I think the most about. Um although I'm I'm also interested in like other places. >> I do think it's the most important case. >> Yeah. So so it seems like the most um so a lot of these stories right are about like a synthesis of AI power and military power. >> Mhm. >> And so it seems like one thing you could do to this for for for this proposal is like or AI power, military power and like executive power, right? All coming together in like a really concerning way. Sometimes people are like the US government should like have this like really big push to have to develop like really powerful AI that it does itself um with like strong pushing AI forward really hard um you know having exclusive access to the to the AI uh and you know it should be like really integrated within the government. It seems like this is probably like pretty bad from the coup perspective. I'm wondering if you have takes there. So I think if you did this really well, it could be good from the coup perspective. If you very carefully designed a project explicitly with reducing this risk in mind, I think you could probably actually reduce coup risk relative to the status quo just because the status quo has is so poor like under the status quo there's there's very little constraining labs. Um, so there's very little guard against the company coup, but there's also, you know, no no like explicit checks and balances that would constrain the ability of the executive to just demand, you know, that the company sell them access to AIS without guard rails that they can deploy throughout the government and military and you know the companies if there's there's a few of them would be in like potentially quite weak negotiation position with the executive over that. So because the the status quo is so bad, >> I think if you if you design a good centralized project, you could you could reduce this risk. Now I think probably the best way to minimize this risk would be to design a kind of a system of um regulation where you continue to have you know multiple kind of constrained regulated projects with with various transparency and safety constraints in place etc. um that that would probably bring the risk down lower still. Um and yeah, that would be better than a centralized project from this perspective. >> One thing that occurs to me um as well is so so again I I still have in the back of my mind okay like uh how how does how do like AI alignment concerns affect this? It seems like uh a lot of the things that people want out of AI alignment could potentially help with this, right? So like transparency, you know, like causing companies to do evaluations of their models. Um having like, you know, having like whistleblower protection schemes. It seems like a lot of these >> probably at least like reduce the chance that like AI labs do stuff in ways that like the rest of the world doesn't know about. Um maybe it increases the risk that like like if you're worried about governments being like too in at you know meddling too much with AI companies to like do tricky things there like maybe that's a concern but yeah I'm wondering like having strong like um AI security institutes or something like how much do you think that helps with Q risk? >> I think all of the stuff you listed helps and and yeah and combination helps a fair bit. Um, and yeah, I do think like just a lot of the lot of the interventions here are like pretty generally good across across both curris and misignment risk. The place where they really potentially bump heads is like where they're decentralizing to just one project versus having careful regulation of >> um multiple projects. >> But beyond them, beyond that, I tend to think that like pretty much there's this pretty strong alignment. There's different areas you focus on. So like you're particularly concerned with like oh how is everyone actually using the compute within you know the air companies and within the government um and like you're relatively less concerned with like looking for like rogue deployments um because you know it's just the kind of legitimate deployments that we might be more worried about now. Um but you know monitoring use of large compute that's that's a way of framing it where that that's both catching the misalignment risk and and and the risk of coups. So I I think I want to talk about things that um are maybe more unique to Q's. So one thing that you don't mention in the report as far as I could tell, but seems interesting to me, a lot of the Q risk seems to come from like you have one you have one AI company. Um it's like relatively dominant. It's relatively in the need lead. It's hard for other people to compete and they're doing like sketchy stuff within that company. Um at least for at least for like company enabled coups >> currently. It's like relatively common for employees. It's It's like not unheard of for employees to like switch around between companies. Like there's a decent amount of it happening. And it seems like this probably helps like partly for um like like it it means that like labs don't have like super like super super intense secrets. But like they probably have some secrets of exactly how they do some things, but like the general structure, you know, the like I'm I feel very confident that uh that the latest Claude and um OpenAI models are based on the transformer architecture, right? Like if they did did something totally different, I just feel like we would have heard of that. >> Um so so it feels like it both protects against like >> a like them doing something like like if Open AI were like, "Hey, all of our employees we're doing a bit of a queue." Yeah. I feel like people would hear about that. And then B, like maybe it protects against like one lab just going super super far ahead of the other labs. Um, so some combination of a like labs like like people moving between labs and b some good old corporate espionage. Um, it seems like this maybe helps. I'm wondering if you have thoughts. Some of the benefits you're describing, I think, just come from the fact that the labs are humans who are integrated throughout broader society and will talk about really notable things. You know, if if a lab tried to tell everyone, look, don't tell don't mention this, but we're doing a coup. >> Um, >> doesn't matter whether employees are switching between labs, like that's >> so then what do we get from the employee switching? I do think you're right. We're getting some kind of evening of the playing field. H >> my view is that you can get a lot of the benefits from by having two separate labs in terms of crew risk reduction >> because if you have two separate labs they can separately train AIs using different stacks and then they can have the AI kind of cross order each other for secret loyalties. You can kind of deploy both systems in the military. So you have like significant additional check and balance there. >> Um where you know if if if you could have both s both both both AI is controlling each robot. So there's you know maybe the open open AI system is controlling the robot but the public system can shut it down and so um with a setup like that you could you know significantly reduce the risk um because you know even if one system is secretly loyal um that doesn't give you strong reason to think that the other system is also secretly loyal and colluding. So I think yeah if you had movement between companies which you know made it less like we just end up with one company instead meant that we get to you know two or three then yeah I think I think >> that would be good. >> So so one thing you mentioned is you think that it's important for um AIS to be aligned to like follow the law >> and to not be like kind of uh loyal to like one individual and just like the law in general. Only concern I have about this and in fact about like enabled coups in general is it feels like it is possible for countries to be too stable, right? Like I I think that it is possible for for the law to be followed too much. Um so well the law being followed too much I think like one version of that is like it is sometimes very unclear what the law involves. Um, a kind of silly version of this is I'm like only 90% sure that the existence of the United States Air Force is constitutional. Um, okay. Because it doesn't the Constitution doesn't actually say that you can have an Air Force, right? Because they didn't think about it. It says you can have an army and a navy. >> Can you have an air force? I don't know. Um I mean that's a bit of a silly example but like like the US constitution it it is like a little bit ambiguous in many places but but at a high level like if I imagine so for existence so for example like the reason there's a United States right is that one part of the United States sort of like broke away for or sorry the United one part of like the United Kingdom broke away from the rest of it right >> and that was I assume illegal. right? Like it was it was illegal. It was like a portion of the United States like breaking the law and being loyal to like one sort of entity within the United Kingdom versus other things. Um and like in general it seems like it's probably good for their for it to be possible for like sometimes bits of states break away illegally and do their own thing. Like how much of preventing coup risk will also prevent like especially via the means of making sure that things are aligned to like the official law will prevent like bits of states breaking away in a way that seems like healthy in the long run. >> I think it's a really interesting question. I think we want to get a balance between locking out the bad stuff, locking out the egregious coups, >> but as you're saying, we don't >> we don't want to lock in too much, >> you know. We don't want as an extreme because we definitely want to lock in like the rules of today can never be changed. um you know so we obviously want to have some process by which we collectively can decide to change the the laws and I think you know that that's by default how it will happen. Um I had previously thought that look if we if we lock in you know the laws of today in a kind of sensible nuanced way then we will leave enough flexibility to collectively decide to change things and you know there could be some process by which it is legitimate for you know a state to break away. >> Um But I think I think you're actually right in practice. It may be that like the naive way of implementing even a nuanced version of the law, it's it's it's possible that that would actually lock in too much and that I haven't thought much about how much really positive stuff has happened historically via law breaking. Um, and do we expect that to continue to be the case even in mature democracies like the United States? like do we want to allow California to just declare that it's um it's it's independent illegally and do we want its AIS to go along with that? Um I I think it's a really good a really good question and it kind of highlights the way in which we may be going down significant path dependencies as we automate the kind of government infrastructure and military because we will you know we will implicitly as once we've automated the whole government and the whole military we will have implicitly baked in answers about whether AIs will support various different maneuvers like we live implicitly bait in an answer about look if California tries to break away and all of its citizens support it and you know most of the broader US supports it but it's actually technically illegal and you know there will be some some decision that that infrastructure of AIS will come to about whether whether it's going to support you know if push comes to shove and there's like going to be you know military intervention what what will the AI military do um you know that's a constitutional crisis and we will be baking in some implicit answer to the question of what will happen that we're like, you know, who who will the AI military support? And I think it just highlights like we should think very carefully before we do this. Um, and there's there's kind of no way to not give an answer like you can't you you can't there's no there's no default because the default, you know, in today's world is just I guess there's a kind of power struggle and random stuff happens. Um, and I think it's a fair point that, you know, maybe it's actually good that you can sometimes do illegal stuff because, you know, it adds more adds adds adds more variety. And so maybe you want to, you know, maybe in the ideal world, we would, you know, have, you know, we say look in constitutional crisis, you know, be wise, consider what's best for the, you know, the broad future and, you know, make the best decision that balances all these interests. And we hope that that would actually be an improvement on the status quo where it's just kind of random and and determined by power. Maybe we can get something that's at least based on some kind of desirable principles um when there are kind of more edge casey constitutional priies. Maybe we don't always make it come down to the letter of the law. Yeah. So there's one version of this which is like um being pro pluralism. There's another version of this which is like or especially like if instead of imagining the US you're imagining like like I think I think there at least conceivably are authoritarian countries where like you actually do want it to be possible for things to break away and yeah I guess there is also this third thing which is like the letter of the law really is like not as clear as you might hope in many cases. Um, it's not totally so, so like I don't know. I was thinking about this before we started. Like one thing you could imagine doing is like being pro pluralism instead of like pro the letter of law. Like it's not I don't know. I didn't spend like 10 minutes thinking about ways in which that could be bad. So probably like there are a bunch of ways that could be bad. I mean another possibility is um you act in accordance with how you predict The Supreme Court judges will >> resolve this question assuming that they're acting in good faith. >> In good faith is in good faith seems tricky and hard to define. >> Yeah. or you know assuming they're trying to do you know the law often has like you know reasonable judgment and >> things like this because if if you don't say in good faith then there's like oh if all the Supreme Court now decide they want to do a coup then the AI knows that then the just so you want to have something there to kind of idealize it. >> Yeah I think there's a yeah there's probably some way you could do it. I mean the thing is about AI is you can give it these fuzzy things like assuming they're trying to be in good faith assume they're being reasonable and just like humans do it's able to work with it even though it's not kind of mathematically specified. Yeah, I think I think there's something to that. Um I guess I so talking more about um ways of stopping coups. I think the so okay one one path is like things you mentioned in the paper you know like try to align to things other than it's definitely just going to be what this one person wants. um try and prevent labled coups by um doing um you know making labs transparent um having like some regulation of labs um I guess preventing executive coups presumably the thing to do there is just like try and elect people who won't do coups >> I think there's like you know building consensus among many different parts of society especially the checks and balances parts that we want AI to follow the law >> to not be used to increase the partisan power of of of the current elected officials. Um, build a consensus that military system shouldn't all report to one person, but should, you know, all report to to to many different humans. And if you can build consensus around that, then that can make it more of an uphill struggle for a head of state that wants to um su. So, like in in the report, a lot of the proposals for how to prevent a coup are very like here's things that we as a society could do. >> Mhm. >> One thing you could potentially do to prevent a coup is also like sabotage type things or like or at least things that like individuals could do um or you know things that are less like globally, you know, globally planned. So, so what I I mean one very minimal version of this is like um for if you imagine there's like some country authoritarian country that you think is like at high risk of an AI enabled coup, you just like not sell AI weapons to them. That's like a moderate version of this. You can also imagine like, okay, even if there's not like a policy for my AI lab to like prevent coups, you can imagine individual workers in an AI lab saying like, okay, I'm just like I'm going to quit or I'm going to like do a I'm going to insert my like don't do coups bit into the code uh slightly surreptitiously. I'm wondering what you think about these like more individualish moves. I definitely support whistleblowing and encourage employees of AI labs to be like okay what's what's going to be my line as as you know if there is movement towards just you know less transparency into what the AI is being aligned to or like even just like it's becoming clear that it's being aligned to the the company or to specific people what is your line in which you're going to whistle blow um you know I think one thing that employees can do is be like I am going to hold myself accountable to to getting positive affirmation that this isn't happening. Now, I'm going to make sure that it's not possible for the company to sneak in a secret loyalty given that I'm aware of what the company systems are like. And I'm going to ensure that the company isn't making >> training AI overtly to be loyal. Um, and so I think, you know, it would be great if there was a culture at companies where like it's just like obviously we wouldn't want this to happen. Obviously, we don't think anyone here would try and do this, but we need to have a, you know, an attitude of vigilance because that's what, you know, um, makes it true that this would never happen. Um, yeah. >> So, I think that's that that that's good. Um, and, you know, one one more positive framing for this is is being like what we need, what one great thing to aim for as a as a company is to make a product which everyone absolutely knows they can trust. Even people who don't trust our our staff and our processes and think we're crooked and think we're going to try and seize power even they should just know they can trust our systems because that's what that's what a good that's what a good product looks like. So you can, you know, you can frame this in terms of building amazing um products that, you know, ultimately you want national security to use. Like if you anticipate that it's going to be public knowledge that sleeper agents are possible, that secret loyalties are possible, then you might anticipate for the government to like proactively use our our IIS in the military and in other critical systems. they're going to want to like really have strong assurances and not just you know trust that they there's nothing fishy going on and that you know that's what aiming for for purely product based reasons. So I do think you know pushing towards that culture um is one thing you can do as an individual. I feel more nervous about like trying to sabotage the training run in a way that prevents secret loyalties but that no one else knows about just because it's such a similar act to introducing secret loyalties. And I just think that all of that stuff should be very taboo and the processes should prevent that. If someone's been listening to this um in particular, I think a lot of my audience are um people who do technical alignment research. Suppose someone's listening to this and they're interested in like okay let's prevent AI enabled coups. Um is there like technical work in particular that you think is like valuable for people to work on here? I think the biggest thing is technical work on detecting sleeper agents and secret loyalties specifically in a setting where there is some kind of adversary that you expect to be purposefully trying to hide that sleeper agent from you. So you know historical sleeper agent work has often um at least this stuff from my topic it's it's um been designed to kind of illustrate the idea of misalignment. Um, so AI kind of acts out when it sees it's 2026 but plays nice before that time. >> Um, but those those those model organisms as as they're sometimes referred to, they're they're not really trying to illustrate the risk that someone is a human actor has has has created a secretly law sleeper agent um that they specifically don't want other humans to to discover. And so I think there's there's potential for a lot of technical work on this kind of offense defense balance question of if you have some attacker that has some kind of access to the the training procedure and you have some kind of defender that has again some level of access. Um again maybe like seeing how the training worked or doing tests afterwards. Can the attacker create a sleeper agent that the defender doesn't detect? And how sophisticated is that sleeper agent? And you know there's so many parameters here that that you know ideally we want to be studying. So the attacker could have just access to like the minimal access would just be they can put things on the internet that that go into pre-training. Um a deeper level of access would be like oh they can do um they can add like add add lines to the finetuning code. Um, but then maybe the defender can see that fine-tuning code or maybe the defender can't. Um, or like maybe the attacker can like add additional training examples which maybe the attacker can see or can't see. And so there's all these different kind of scenarios where like the attacker defender has various different affordances. And ideally what we would have is a good understanding of under what situations does the defender reliably win. Because if we have that understanding, we can then set up our training process >> so that it is secure enough that the defenders indeed reliably win given the actual kind of access that they have and given like the the likely access that we believe an attacker could gain. >> How close do you think this is to just basically what AI control people are doing versus do you think there are like specific aspects of this that are not covered by like typical control research? >> Well, I think it's it goes beyond control. So you know um interpretability >> is something you could look at here. Typically we look at interpretability with a mind accidental misalignment. But you could say can I use interpretability to catch sleeper agents even when the person is specifically trying to hi hide them. >> You know what if what if they know what if they can guess the kinds of interpretability techniques I might be using in advance. You know can I still detect the sea agent then? So you know basically any take any method we have for testing alignment blackbox testing interpretability um you know looking through the training data um you know for many of those methods you'll be able to adjust it to consider the more adversarial setting where someone is specifically trying to create a sleeper agent that you cannot detect and so I think you know it's not just control it's kind of alignment and control >> um and then I don't think it's exactly the same as um you know standard control because you it is a harder setting because >> right as we touched on earlier like the normal setting is where like the AI kind of has to zero shot you know figuring out um how to how to dodge all these tests um and how to sabotage whereas here like um the attacker would have more ability to like help the AI do a good job >> pre-train it how to >> exactly you know how much does it know about the company's control processes. If you're worried about an insider threat actor, then you might be worried, you might want to study, okay, how how much of a better job can they do if they do versus do not know about their um specific control setup. And I I actually think control the control research already already looks at this variable. Um, >> yeah. Well, I imagine it depends. You know, I think I think it's one of these ideas that is like that is caught on and it's maybe a bit hard for >> I think I think it's hard for the world to like refer to exactly one things exactly by the term AI control. So, you know, probably different people are thinking slightly different things and it's worth thinking about like, okay, I want this version. >> Yeah. But I just, you know, I just give it one last, you know, pitch, I suppose, which is to say that there's so many people that have worked on misalignment from the perspective of, you know, um, accidental misalignment. There is just, you know, very few people that have really comparably gone in on like techniques for for for very powerful AI detecting purposeful sleeper agents. I think it's going to be a really important problem. um it's one day going to be commercially important as we've discussed because you know the government will warn guarantees but um it's it's it's it's very neglected and so I I I you know I would really encourage technical people to um to work on this probably the first stage is actually a research a research agenda um that that that sets out the space and and you know looks looks at the current alignment techniques and what kinds of like best best ways of of adapting the research there would be to to think to account for this new threat model. So, okay, that's like technical research people can do if they're interested in this threat model. Um, but not not all of uh my potential listeners to this episode are like technical researchers. I'm wondering like um like stuff other than technical research. Like do you think there are like further questions that need answering here? >> One one thing I'll briefly mention is is the infosc side. Um so that is technical. I don't know how many of researchers um work on this but a lot of infosc is focused on weight expiltration. Um >> but there are infant security problems which which arise from this model which are new. Um so it's much more about how can you secure the training process against unauthorized edit access to the weights. So rather than you know reading read access to the weights to exfiltrate it's it's um you know edit access and that that has some distinctive um features and I think again like a kind of a road mapping exercise that really that that goes through all the different info security things that labs could do and thinks we're all the best bet for making it hard to do significant unauthorized edits. Um seems like you know lowhanging fruit. Um and then I think you know we should labs should be upweing those relative to their previous prioritization. Um and so if if you do info security um I'd really encourage you to pitch pitch that project to your team. Um but you asked about non-technical work. >> Yeah. >> I think one one thing is just doing more detailed threat modeling as as as we've kind of touched upon um in in the conversation. There are lots of steps of these threat models that you could dig into in more detail. Um, you know, how hard would it actually be to train secret loyalties without detection? How hard would it be for those AIs to pass the secret loies on to the next generation and then to military systems? Um, you could look into okay from for the executive coup, you know, what is the legal situation currently with respect to automated um military systems? um who has legal authority over the procurement process and who would by default be looped in to um you know if if powerful AI military systems were being built who would be looped into their kind of their loyalties and um how they're being designed um and then you know again then thinking about mitigations so once you've got a more detailed threat model you can then think about you know what one thing that I I thought of recently is well if we just if all the labs send a copy of their weights to safe enclave, then it makes secret loyalties a much more risky prospect because they can't just delete those weights and then you know whoever they send the weights to can in you a few years time do do tests with much more sophisticated techniques. >> Um so you know it's possible that thinking through the thresholds will bring up other ideas like that. >> Well yeah in that case you've got to be confident that the copy of the weights the lab sent you is like the actual thing that they're running. Um but that's there must be some way to >> Yeah. Well, you can certainly do um you know, hashing the weights. So, you can you can check that the copy you've received is the one they're currently running as long as you can kind of get them to actually hash the the weights that they're really running and then compare them. Um >> yeah, >> but yeah, no, you're right. There there's there's room for maneuvers there. Yeah. I I wonder if this is So sometimes people talk about um yeah, we're going to have chips like like computer chips uh and they're going to have a little thing on them that checks if you're doing like really crazy AI training and reports that um you know just so that we can so that governments can monitor like uh how much AI training people are doing. It seems like a similar thing you might want to do with chips is like yeah are people running the model weights that they say they're running? Um that that seems like it's potentially like valuable for this threat model. Yeah, that's a great idea. I hadn't thought of that. But if you've Yeah, if you've like what you could do, you finish your training process, you you kind of hash the weights, then you you you do all these kind of in-depth alignment tests and then you you know you send the the weights to the safe enclave so that you know then you've got you can do even more tests later and then you have the the chips regularly have regularly check that the weights are are the same as as as what you ended up with. >> I guess also presumably there's some amount of just thinking about structures that would be good. So like I think you mentioned that you know a centralized AI project like if you structured it correctly like maybe it would be good at being like AI enabled Q resistant like I imagine there's probably more thinking someone could do about how you would like actually set that up. Yeah. And for for all the recommendations in the paper, there's a lot more thinking about implementation. you know, we're we're we're giving recommendations on a very high level, you know, transparency about, you know, various different things and um sharing of capabilities with, you know, different parts of society to avoid exclusive access and um you know, AI should follow rules that that mean they can't be used for coups, but you know, all of that, you know, what rules exactly and you know, exactly how are we going to structure this transparency requirement and who which exact body should be capability should should they be shared with Um, so you know what one type of work I'm excited about is working on drafting contracts between governments and labs that specify these requirements concretely. >> Yeah. And similarly for setting up uh a centralized project get getting much more detail about how it would be structured as you say. >> I think I'd like to go move a little bit on to Fore the organization that like put out this. Um, but before I do that, is there any like last things you want to say about AI enabled coups? I'll say one more thing which is that I think it's really helpful in many contexts to be very explicit about the threat model we're concerned with. We've we've talked very explicitly about executive cuckoo and you know you know lab leading lab leaders during coups >> that that's helpful for thinking clearly. >> I don't think it's the most helpful frame in many contexts. um coups sound kind of extreme in many contexts and um it sounds like an adversarial framing. It sounds like you're pointing fingers at individuals um rather than just being like well obviously this should no one should be able to do this and so I do think like there are other more useful frames in many contexts. So rather than like let's prevent secret loyalties, I like the frame of system integrity which just means so that system does what it says on the tin hasn't been tampered with. Um and rather than you know preventing an executive coup, you can talk about checks and balances, rule of law, democratic robustness. >> Yeah, it's a good point. Um so okay and I want to talk a little bit about forethought. So Forethought is this like news organization and like in March or April uh you guys put out a bunch of papers. Um >> so or a bunch of reports. >> What's forought? Where what's what what's going on? >> Uh yeah, it's a research organization. We kind of um aspired to be considered a successor to FHI. So FHI was a macro strategy research organization. So kind of thinking about, >> you know, strategy in, you know, the most zoomed out terms possible. You know, often it was kind of thinking about the very long run future and the different outcomes that might occur. Um, you know, things like the vulnerable world hypothesis and um astronomical waste are kind of, you know, big big picture questions that that big that big picture papers that came out of of of that institute FHI. So we're we're aspiring to kind of be the follow on um successor that is you know tackling you know the really big strategy questions and the kind of way we're currently framing it is we are going over the coming decades very very plausibly to to transition to a world with super intelligent AI systems that is just going to bring a whole host of you know major major changes. Um, AI misalignment risk is one really important risk to be be thinking about over that transition, but there will just be a whole host of other issues. Air enabled coups is is one example and it's the first one that we've, you know, really focused on or at least that I've really focused on, but it's it's not the only one. I mean, I really enjoyed your recent podcast um on kind of AI rights. I think that's going to be another really big issue um that is um very much on our radar. Um and you know that there's going to be many other big issues as well where another one that that we we're excited about is just that at some point we're going to you know start start kind of getting access and using resources in space and there's a you know how those resources are used is going to be a very very important question. you know that is that is basically all the resources and we have no idea how we're going to use them how we're going to divvy them up what the processes will be you know in a sense everything is up for grabs in that decision um so that that's that's another big example and you know I expect there'll be other things where just there's going to be so much change um as as we're going through this there's just going to be a lot of things which which emerge and so our aspiration is to kind of be keeping our eye on the ball of these very high level strategic questions and issues and you try trying to help us figure out what we should do about them. >> Yeah. You mentioned that like the the first thing that you focused on is enabled coups. Um the the things you've mentioned are those like roughly the things that you expect the institute to prioritize or like yeah what might I see out of uh forthought in the next year or so? I think those are our current best guesses. I think as I mentioned so I think space governance you might well see stuff on that. you might well see stuff on AI rights um or you know specific schemes to kind of pay the AIS to work with us if if they are misaligned something that we feel quite excited about and seems still underexplored though it's getting more attention which is great >> I think kind of positive uses of AI for kind of improving epistemics for improving government decision-m for ensuring that democracies don't fall behind autocracies um in an automated um economy though those are some other um issues that that seem seem like we we might not focus on uh like another issue would be like okay if we're choosing these AI's personality you know exactly what what should it be aligned to um which which you know is again a question which is getting more attention but it's you know going to be very very consequential >> another thing to ask a bunch of my listeners are like you know they're Maybe maybe they're coming out of undergrad. Maybe they're like uh considering maybe they're in a space where they they're considering changing careers. >> Is forthought hiring? >> Um yeah, we're planning to do an open hiring round soon. I'm not sure exactly when we'll release it, but I would really encourage people to apply. Um you know, I think there's a lot of talent out there and I expect there's a lot of talent we're completely unaware of. So, you know, even if you don't think that you've got you've got the skills or the knowledge, >> you know, it's there's no great on-ramp to to doing this kind of work at the moment. And, you know, I think there's there's a big danger of people, you know, just ruling themselves out prematurely. So, when we when we do release the open hiring around, please donate application. Final thing I want to ask um suppose someone uh listened to this episode um you know they they found it interesting and they want to hear more about um the work you do. >> What should they how should they go about doing that? >> With me personally um you can follow me on Twitter. If you if you Google Tom Davidson AI X then you know you you'll you'll see my Twitter pop up on Google. So you can follow me, subscribe. Um I post basically all of my research on less wrong because you know that's where big community that cares about similar issues is. So you can you know if you have a less wrong account you can subscribe there. We have um a four forought substack. So if you again you just Google for thought substack then you know the top link you know subscribe um that would be great. Um and then know you can also um follow Wilmer Cascal who's the other senior researcher for follow him on um Twitter and um Leon as well. >> Great. So yeah, links for all of that will be in the description of this episode. Um Tom, thanks very much for coming on. It was great chatting with you. >> Yeah, real pleasure. Thanks so much, Daniel. >> This episode was edited by Kate Brunaut and Amarornace helped with transcription. The opening and closing themes are by Jack Garrett. The episode was recorded at Farlabs. Financial support for the episode was provided by the long-term feature fund along with patrons such as Alexi Maf. To read a transcript, you can visit axrp.net. You can also become a patron at patreon.com/exrdodcast or give a one-off donation at kofi.com/exrd. That's kohfi.com/axrpodcast. Finally, you can leave your thoughts on this episode at axrp.fyi. [Music] [Music]

Related conversations

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs

AXRP

15 Jun 2025

David Lindner on Myopic Optimization with Non-myopic Approval

This conversation examines core safety through David Lindner on Myopic Optimization with Non-myopic Approval, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -2 · 113 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.

Mirror pick 1

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0

This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)