Signal Room / Leaders Watch

Back to Signal Room
PrevNext
AXRPCivilisational risk and strategy

Caspar Oesterheld on Program Equilibrium

Why this matters

Auto-discovered candidate. Editorial positioning to be finalized.

Summary

Auto-discovered from AXRP. Editorial summary pending review.

Perspective map

MixedGovernanceMedium confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

Showing 140 of 157 segments for display; stats use the full pass.

StartEnd

Across 157 full-transcript segments: median 0 · mean -0 · spread -75 (p10–p90 00) · 0% risk-forward, 100% mixed, 0% opportunity-forward slices.

Slice bands
157 slices · p10–p90 00

Mixed leaning, primarily in the Governance lens. Evidence mode: interview. Confidence: medium.

  • - Emphasizes safety
  • - Emphasizes ai safety
  • - Full transcript scored in 157 sequential slices (median slice 0).

Editor note

Auto-ingested from daily feed check. Review for editorial curation.

ai-safetyaxrp

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video NMEwiZQK_C4 · stored Apr 2, 2026 · 4,232 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/caspar-oesterheld-on-program-equilibrium.json when you have a listen-based summary.

Show full transcript
[Music] Hello everybody. In this episode, I'll be speaking with Casper Osterheld. Casper is a PhD student at Carnegie Melon University where he serves as the assistant director of the foundations of cooperative AI lab. He researches AI safety with a particular focus on multi- aent issues. There's a transcript of this episode atrp.net and links to papers we discuss are available in the description. You can support the podcast at patreon.com/axrpodcast or give me feedback about this episode at axrp.fyi. Okay. Well, Casper, welcome to Axer. >> Thanks for having me. >> So today we're going to talk about two papers that uh you've be been on. First is robust program equilibrium where I believe you're the sole author and the second is characterizing simulation based program equilibria by Emory Cooper yourself and Vincent Conniter. So I think before we sort of go into the details of those papers um you know these both use the terms like program equilibrium program equilibria what does that mean? Yeah. So this is a concept in in game theory. Uh and it's about the equilibria of a particular kind of game. So I I better describe this kind of game. So like imagine you start with any any sort of game sort of in the game theoretic sense like the prisoners >> um which maybe I should describe briefly. So >> like imagine we have um >> yeah we have two players and they can choose between raising their own utility by one or raising the other players's utility by three. Um and and they they only care about their own utility like they I don't know they play against the stranger for some reason. They don't care about the stranger's utility. >> And so they both they both face this choice. Um and uh the sort of the traditional kind of like game game theoretic analysis of this uh game by itself is that you should just raise your own utility by $1 and then both both players will do this and they'll both go home with with with $1 >> um or one Utilon or whatever. >> Um and of course there's sort of some yeah this is some some sort of some sort of tragedy, right? It would be it would be nice if they could somehow agree to uh in this particular game to uh to both give the other player $3 and to both walk home with the three $3. >> Yeah. Yeah. Yeah. And and to just to to sort of drive home what's going on, right? Like if if if you and I are playing this game, right, the core issue is sort of uh no matter what you do, I'm better off um giving myself the one utility or the $1 rather than like giving you three utility because I don't really care about your utility. >> Yeah. >> So, so I guess there are two ways to put this. Firstly, just like no matter what you play, I would rather choose the like give myself utility option um commonly called defect rather than cooperate. >> Yeah. Another way to say this issue is like in in the version where we both give each other the $3, I'm better off deviating from that. Yeah. But if we have the if we're both in the like, you know, only give ourselves $1, you know, situation, neither of us is made better off by deviating. And in fact, we're both made worse off. So like it's a sticky situation. >> Yeah, that's Yeah, that's all correct. Of course. Um Okay. Okay. And now this program game setup imagines that we we take some game and now instead of playing it in this sort of direct way where we directly choose between yeah corporate and effect like raise my utility by $1, uh or or the other players by $3. Like instead of choosing this directly, we get to choose computer programs. Um and then the computer programs will choose for us. M >> and importantly so like so far this wouldn't really make much of a difference yet like okay we choose between a computer program that defects or a computer program that cooperates or the computer programs that run in runs in circles 10 times and then cooperates at effect doesn't really matter. Yeah >> but the the crucial addition is that the programs get to get access to each other's source code at runtime. So when I so I submit my computer program, you submit your com computer program and then my computer program gets as input the code of your computer program and kind of based on that it can decide whether to yeah cooperate or defect or I mean take any other game. Um uh so it can I don't know it can like look at your computer program and sort of like I don't know does it look cooperative and depending on that cooperate or effect? Yeah. Or it can look I don't know is the is the fifth character in your computer program an A and then cooperate if it is and otherwise defect I mean no reason to submit this type of program but this is the kind of thing that they would be allowed to do. >> Yeah. Well there are sometimes um so this sort of like very syntactic analysis. Um, a while ago I I was part of this like uh ba basically a uh tournament that did this kind of like uh prisoners dilemma thing like with these open source programs. Um, and one strategy that a lot of people used was like if I see like a lot of characters like like if I see a string where that string alone means like I will cooperate with you then cooperate with that person otherwise defect against that person. Mhm. >> Uh which like I think if you think about it hard this like doesn't actually quite make sense. Um but like def I don't know there are sort of very syntactic things that that in fact like seem kind of valuable especially if you're like if you're not able to do that much computation on the other person's computer program like uh just simple syntactic hacks can like >> be better than nothing I think. >> Yeah. Was this uh Alex Menon's uh tournament on less wrong or was this a different >> No, this is um this is uh the the manifold one. Um >> ah okay. >> Yeah. So so this is uh you had to write a JavaScript program. It had to be fewer than uh however many characters and there there's also market on which program could win and you could submit up to three things. So, so actually I don't know kind of annoyingly to me like one thing I only realized afterwards is the thing you really should have done is like write two programs that cooperated with your program and defected against everyone else's and then like or or just cooperated with the program you thought was like most likely to >> win and then you bet on that program, >> right? >> Or even you could you could submit three programs, have them all cooperate with the thing that like you hoped would win and defend against everyone else and then like bet on any anyway. But but um in that setting especially where you so so in that setting there was like a timeout provision where like if the code ran for too long like uh your but what would be disqualified >> and also you had to write a really short program. So >> it was and I don't know some some people actually managed to like write pretty smart programs but uh if you weren't able to do that uh I I think like relatively simple syntactic analysis was like better than nothing I think. Yeah, I think there was this earlier uh tournament in like 2014 or something like that when >> there was less known about this kind of setting and a bunch of programs there were also like based on these like simple syntactic things but it's like in part because kind of like everyone was mostly thinking about these simple syntactic things uh right >> like it it was all like a little bit kind of kind of nonsense like you I don't know you would >> you would check whether the the opponent program has like a particular word in it or something like that and then and then I think the winning program had particular words in it, but it would just still defect. Um, >> so it's like this sort of Yeah, it's it's it the in in some sense all this those dynamics are like a little bit uh a little bit sort of sort of nonsense or like they're not really tracking the in some sense the strategic uh uh nature of the situation. >> Fair enough. So going back uh you were saying yeah you have your opponent's program and you can like see if the fifth character is an A or and then >> yeah what what short one perhaps do so the yeah so this the I think the setting was first proposed in like I think 1984 or something like that um and uh and and then kind of re re rediscovered or reinvented I think like three times or something like that um in in various papers And all of these initial papers um find find the following kind of very simple program for for this prisoners dilemma type situation. >> Yeah. >> Uh which just uh goes as follows. If the opponent program is equal to myself like to this program Yeah. >> then cooperate >> uh and otherwise defect. >> Yeah. >> So this program is an ash equilibrium against itself and it cooperates against itself. So if both players submit this program, uh neither is incented incentivized to deviate from playing this program, right? Like if you play this program that checks that the two programs are the same and if if they are cooperate, otherwise defect. You submit this program, the the only the best thing I can do is also submit this program. If I submit anything else, you're going to defect. So I'm going to get at most uh one if I also defect, right? Whereas I get three if I if I also cooperate. Uh so all yeah all all of these original papers proposing the setting uh they all find this program um yeah which which allows yeah stable cooperation in in this setting >> right and and I think that like so my impression and maybe this is totally wrong is I I think for a while there's been some sense that like if you're like rational and you're playing the prisoners dilemma against yourself you should be able to cooperate with yourself like I think um didn't Uh what wasn't there some guy writing in scientific American about like super rationality and he held a contest like basically on this premise? >> Yeah. Um yeah Hofutter I think. >> Right. Right. >> Uh I think also in the 80s or something. Yeah. Um yeah had uh yeah there's I mean I mean I've done a lot of work on this kind of reasoning as well that uh like I don't know if you're like I mean for for people for humans it's like a little bit hard to think about right if you like >> you don't you don't often face very similar opponents or like it's a little bit unclear like >> how how similar other people are like I don't know is your brother or whatever like someone who is related to you and like was was brought up in a similar way are they very similar it's kind of hard to But for computer programs, it's very easy to imagine, of course, that you just you have, I don't know, you have two copies of of GPT4 or something like that and play a game against each other. It's a very normal normal occurrence in some sense. I mean, maybe not them like acting in the real world at this point, but um but this this sort of having multiple cop copies of a of a computer program is is quite normal. Um and yeah there's there's sort of this yeah related but to some extent independent literature on these sorts of ideas that if you that you should cooperate against copies. >> Yeah. >> Basically >> but but yeah basically I'm wondering if uh the like if this idea of like oh cooperating against copies is like what inspired these like very simple programs. Yeah that is a good question. Um yeah I don't know I basically don't know to what extent this is the case. Um I know that the yeah some of the later papers on on program equilibrium I remember some of these like specifically citing this super rationality concept >> but I think um yeah I don't remember uh whether yeah these papers but I think McAfee is like one of these who wrote about this in the 80s I don't know whether they discuss super rationality >> yeah and it's and it's kind of tricky because in that like if you actually look at the computer programs they're not doing expected utility maximiz ization, right? They're just like or or they're not computing expected utility maximization. They're just like, >> you know, um if identical to me cooprate else defect, just like hardcoded in anyway, this is sort of a maybe this is a distraction, but indeed these were like the first programs like considered in the program equilibrium literature. >> Yeah. So they sound great, right? Yeah. So I mean they're great in that in the prisons dilemma you can you can get a an equilibrium in which you can get cooperation which otherwise you you can't or like you and you can't achieve with like I don't know various naive other programs that you might write but I think um sort of in in practice and like it's kind of hard to it's it's not so obvious what the practice of this uh of this scheme looks like but like if you if you think of any kind of practical application of is uh it's sort of a problem that you know the settings are somewhat complex um and now you need you need you two people write programs independently and then these programs they sort of need to be the same somehow or they need to I mean they're like slightly general versions slightly more general versions of these where they check some other syntactic properties >> but basically yeah you you require that you you kind of coordinate in some way on like a particular uh kind of source code to write uh which maybe in some cases you can do right sometimes maybe we can we can just talk beforehand like if we play this prisoner sto we can just explicitly say okay I'm here's the program that I want to submit please submit the same program and you can say like okay let's let's let's go um but maybe in cases where we yeah we really write these programs independently maybe at different points in time um and these yeah these programs especially if they do more complicated things than play the prison slmer it's very difficult to um to to coordinate uh with yeah without kind of explicitly talking to each other uh on writing programs that will cooperate against each other because like even if like even like in the prisons dilemma you might imagine that like I don't know I might I might have like an extra space somewhere or maybe maybe you write the program like if if the two programs are equal cooperate otherwise defect and I write like if the two programs are different defect else cooperate and like all these like very minor changes would already break these uh schemes. >> So, okay. Okay. I I think my first there's a lot to just ask about um there. I think my first question is like we have this notion of program equilibrium like what are we actually like are we trying to find Nash equilibria of programs? Are we trying to find like evolutionarily stable strategies like like what are or may maybe you know there there are tons of solution concepts and we just want to play around with the space but like yeah what are the actual like what's the thing here you know >> yeah so the most the most um like the the solution concept that people talk about most is just Nash equilibrium um >> so that's like the if you yeah if you look at any of these papers and you look at the results they'll they'll prove pro like I don't know these kinds of programs form a Nash equilibrium of the program game. >> Yeah. >> Or I mean that that is like the term program equilibrium literally just means a Nash equilibrium of the game in which the players submit these programs. >> Yeah, >> that is almost always the solution the kind of game theoretic solution concept that people use. Um then I mean usually usually a bunch of other things are sort of like a little bit more kind of implicit like it's going to kind of clear that you like people are interested in finding sort of good Nash equilibria right like in some sense the whole point of the setup is um you know we start out with the prisoners dilemma and sad we everyone's going to defect against everyone else and and we're not getting to corporation and now come in with this new idea of submitting programs that get access to each other's source code and with this we get these cooperative equilibria. So, so that is that is usually at least Yeah. Yeah. I mean it's often quite explicit in the text that you you you're asking can yeah can we find like good pro uh good equilibria in some sense like yeah ones that are um like parto optimal in the space of possible outcomes of the game or something like that. >> Yeah. >> Yeah. Yeah. And then additionally a lot of the like a lot of the work after yeah after these early papers that do the syntactic comparison based program equilibrium um are about this yeah this kind of intuitive notion of robustness that you want you want to have equilibria that aren't sensitive to where the other program puts this the spaces and the semicolons and like like these syntactic details. But it's um yeah it is kind of interesting that uh this isn't like this isn't formalized usually. Um and I also like yeah like so the second paper that we talked about um like we presented this at at triple AI and one like uh one game theorist came uh came to our poster and said like yeah like I don't know like like like to to him it was sort of strange that we that there's no like there's sort of no formalization or like >> like in terms of solution concepts in particular of this um uh of this kind of robustness notion that uh that I guess we'll we'll talk about the programs that we're claiming or like that we are arguing are more robust but like the this uh this syntactic comparison based program there's sort of some intuitive sense and like we can give like concrete arguments but it's sort of not formalized in the uh in the solution concept like there's no like like the okay one of my favorites is called robust program equilibrium but robust program equilibrium is not actually a solution concept in the sense that >> Nash equilibrium is or trembling hand equilibrium is like the robustness is is more some sort of intuitive notion that uh I think a lot of a lot of people find compelling but in some sense it's not yeah not formalized. >> Yeah. And it's funny because so so I think it's very related okay I see this as like roughly within both the cooperative AI traditions and the like agent foundations traditions. Um and like I think these traditions are sort of related to each other. Um and in particular in this setting like like in decision theory um I think there's also like some notion of like fairness of a decision situation. So like sometimes people talk about like you know like like suppose suppose you have a concrete instantiation of a decision theory meaning like a way somebody thinks about making decisions. There are always ways of like making that concrete instantiation look bad by saying like like suppose you have like Casper decision theory uh we'll call it CDT for short. And then you can you can be in a decision situation, right? Where like some like really smart person figures out what decision theory you're running, punches you if you're running CDT, and then gives you like a million dollars if you're not. And like there's a sense that this is unfair, but also it's like not totally obvious like like in that setting as well, I think there's just like no notion of what the fair thing is, which is kind of rough because you'd like to be able to say, "Yeah, my decision theory does really well in all the fair scenarios." Um and it seems like it would be nice if someone figured out a like relevant notion here like like is are people trying to do that? Are you trying to do that? >> Yeah. So I think that so there yeah there is some thinking like in both cases and I think the probably like the most the the kind of notion that people talk about most is probably similar in both. So, >> so like in this decision theory case, I think the the thing that probably most people agree is that it should somehow this decision situation should be somehow be a function of your behavior. Like it shouldn't >> like it shouldn't kind of check >> like do you run cdt and if you do uh yeah get punched in the face. Yeah. Um it it should be sort of like if in this situation you choose this then then you get some low reward but sort of yeah sort of this behavior should somehow be be uh behavior- based. >> Yeah. >> Which I think still isn't enough but it's I mean this sort of goes into the the weeds of this um of of this of this literature. Maybe we can link some paper in the in the show notes. But the the I mean the condition that we give in >> like in the like in the sec second paper we uh like I don't know like or maybe even like in both of the papers that we're going to discuss like there's some explicit discussion of this notion of like behaviorism >> um which also says like kind of like you you want >> in the program equilibrium setting it's sort of it's sort of nice to to have a a kind of program that only depends on the other program's behavior rather than the syntax. attacks and all of these uh approaches to robustness like trying to >> you know do some proof of about proofs about the uh opponent programs like about what the the opponent program does like you know try to prove whether the opponent will cooperate or something like that like all of these to some extent like these notions that people kind of like in intuitively find more robust they're all like kind of more behaviorist at least than than this uh syntactic comparisonbased idea. >> Yeah. Although it's tricky because like um and sorry I don't know if this is like going into the weeds that you want to postpone but like so this behaviorism based thing like if you think about the uh if you're equal to me cooperate else defect program this is behaviorally different from the if you're unequal to me defect else cooperate program right like it does different things in different situations >> and therefore like uh >> like once you can define an impartial thing right then maybe you can say well if if you act identically on impartial programs then like you count as impartial. But actually maybe that's just a recursive definition and we only need like one simple program as a base case. I think we do actually have uh I think we have a recursive definition of yeah simulation simulationist programs um that I think is a a little bit sort of like trying to address some of these issues. Um but yeah it does go it does sort of go into the weeds of like yeah how exactly yeah what exactly should this definition be? Um >> yeah okay let's go back a little bit to the ziterata of program equilibria. So they're computer programs, right? So presumably like um and this is addressed a bit in the second paper, but just like runtime, you know, computational efficiency, like that seems like irrelevant to Zidatum. Um >> yes, I I agree. >> And then like I think that I imagine various include Yeah. have like a broad range of programs that you can like work well with. And it seems like there might be some notion of just like don't fail like if you fail fail not so badly rather than fail really badly. Yeah. I don't know if there's like a so this this is like slightly different from the notion of robustness in your paper and I don't know if there's a good uh a good like formalism for this. I don't know. Do you have thoughts here? >> Yeah. I mean in some intuitive sense uh kind of what one wants right is that if I kind of like yeah slightly change my program maybe even like in a way that is sort of substantial like I >> it's sort of in the prison of slmer it's a little bit unclear like if if I defect slightly more like if I if I don't cooperate 100% but I cooperate like 95% >> it's sort of unclear like do you want to >> like to what extent should you be like robust like do should you defect against me all of the time like in other I guess in other games where maybe they're like different kinds of cooperation or something like that. Um yeah, you you would want if I like kind of cooperate in like slightly the wrong way, um the the outcome should still be good. I think the main sort of like I think in some sense like there's some there's something here that's like I think is conceptually quite clear, right? This sort of like if you deviate in some like kind of like reasonable harmless way, >> it should still be fine. like we shouldn't defect against each other. We should still get like a decent decent uh utility, but it's sort of the the the details are less clear like yeah what exactly are the deviations and it's probably depends a lot on the on the game. Um yeah and then and then there are a lot of these sort of things that in game theory just kind of like unclear like yeah if I if I defect 5% more how much how much should you punish me for that right um and so I think that's why yeah a lot of these things aren't like they aren't really formalized in in these papers >> okay so now that we know what program equilibrium is why does it matter >> yeah so there are lots of there are lots of different possible answers to the to this question. I think the most straightforward one um is that um like we can we can view program games like program equilibrium as sort of a model of how uh games could be played um when different parties design and deploy AI systems. So this this whole thing of like having a source code that the other party can can like look at and uh can maybe run or can like look at the character five and stuff like that. Yeah. >> Um this is sort of something that is like it's like somewhat specific to uh to computer programs. um like we can talk about like whether they're like human analoges still but um but like when I don't know when when we play a game against each other there's it's sort of hard to imagine like an equivalent of this like I I just I don't know I have maybe I have some some vague model of how how your brain works or something like that but I I don't really like there's no source code I can't really run you in some ways whereas whereas for if if um yeah if we both write computer programs this can just literally happen right like we can just literally say, you know, this is the source code that I'm deploying. I don't know, I I have my charity or something like that and and I'm I'm using some um some AI system to manage like how much to donate to different charities. I can just say, look, this is the source code that I'm using for managing what this charity does. Um and and and here I think program equilibrium is is or like program games are are like quite a kind of literal direct model of how how these interactions could go. Of course they could also like you can also deploy the AI system and say like well we're not saying anything about how this works. Uh in which case obviously you you don't get these program equilibrium type dynamics but it's a way that um that they could go and that people might want to use because it allows for uh cooperation. So it's sort of so I think the most the like most direct interpretation um is yeah is that it it it models like a kind of way that games could be played in the future when yeah more decisions are sort of made by delegating to AI systems um and and sort of as yeah as people in in this kind of community who who like yeah think and and to some extent worry about um a future where lots of decisions are made by AI. This is like an important thing to think about. Uh and meanwhile because like to to most like game theorists is sort of like a a weird setting because well humans uh can't read each other's source code, right? And so it's sort of it's sort of underststudied um uh by our lights, I guess. Yeah. >> Um yeah, because it's it's currently it's not a super important uh way that games are played, >> which is interesting because like there so I guess we don't often have like games played with um with source code with like mutual source code transparency, but there really are like computer programs that play economic games against each other in economically valuable settings, right? like a lot of trading um in the stock market is done by uh computer programs. A lot of like uh bidding for advertisement space is done by computer programs. So and like algorithmic mechanism design so like like mechanism design being sort of inverse game theory of like if you want some sort of outcome how how could you figure out the game to make that happen? like you know algorithmic mechanism design being like that but you know everyone's a computer like that seems like like there's decent uptake of this as far as I will algorithmic game theory there's decent uptake of that so I'm I'm kind of surprised that like the mutual transparency setting is not like more of interest to the broader community. >> Yeah, I think I I agree. I think I mean a lot of these settings okay so the the I think the trading case I guess yeah it's a case where like decisions are made on both sides by algorithms um but yeah usually because because it's like I don't know it's a kind of a zero sum game right um you don't want you don't want to want to reveal to your competitors how your trading box works >> for the yeah there's I guess there yeah there's a lot of this like mechanism design where you have an algorithm I guess those are usually cases where it's sort of like unilateral transparency, right? like I as a um like I I don't know I auction off something and I'm saying okay I'm using this algorithm to determine who gets I don't know this uh I know broadband frequency or like what like these things that are being auctioned offered um um so I guess those are cases of sort of unilateral transparency and that is I guess studied much more in part because it's less like in I mean this this also has been studied like kind of traditionally in game theory much more like in some sense like I don't know you can view it as some some sort of stackleberg equilibrium like you can I don't know you can view all mechanism design as like being a bit like kind of finding stackleberg equilibria and I guess stackleberg even I think like stackleberg's uh analyses of game theory like they even precede like Nash equilibrium so >> interesting >> um so that is very old >> yeah where stackleberg equilibrium is like one person does a thing and then the next person does a thing and so like the next person is optimizing given what the first person does and the first person has to optimize what's really good for me given that when I do something the other person will optimize what's good for them based on what I do. >> Yeah. >> Um yeah. So so people people look at Stackleberg equilibria and these sorts of games and it's like a common thing and and yeah it's an interesting point that you can sort of think of it as like one way transparency. Um >> yeah, >> I think one, yeah, one thing one could think about is like sort of like how much how much humans uh in these mutual transparency settings. >> So So yeah, I mean I guess I I always kind of say like for like individual humans like if if the two of us play a prisoners dilemma. >> Yeah. >> Yeah. I have some model of you can't really read yourself. I don't know like seems seems sort of speculative but there is so there's this paper which I really like by um Andrew Critch Michael Dennis and uh Stuart Russell all from Chai where of course you you uh uh graduated from um and and this so this is about program equilibrium as well and the the sort of the motivating settings setting that they use is institution design. >> Yeah. And the idea there is that okay institutions you can view them as sort of like rational players or something like that or like they I mean they they make decisions and they play games with each other right like the I don't know the US government plays game with the German government or whatever. M um but institutions have some amount of transparency right they they have laws that they need to follow um they have like constitutions um they also because they they're sort of composed of lots of individuals that in principle like one could ask right like the I don't know the German government could check all the social media profiles of all the people working for this for the US government and sort of learn something about how these people interact with each other or something like that. So there's there's some like very kind of concrete uh transparency there. Um and in particular like yeah I mean some things are really just sort of algorithmic type commitments right like I don't know we don't we don't negotiate with terrorists or something like that. It's sort of like a specific >> something that's like in the source code of a country in some sense uh and kind of saying um yeah we specifying how how it's going to choose in particular uh interactions. Um so that so I think that is a case where um interactions between human like yeah human organizations has this sort of uh transparency. Um and I guess I yeah you could I think that's like some evidence that that we we could we could get similar things with AI. At the same time, it's al also kind of interesting that this hasn't motivated people to study this program equilibrium style setting, which I think is probably because like I think as a computer scientist, it's sort of natural to think like, you know, the constitution is basically just an algorithm, right? It's also a little bit this sort of like this sort of um I don't know like computer science uh people like explain the world to every to to to everyone else by using computer programs for everyone and like the mind is a program and the Constitution is just a program and and we we got it covered with our computer science stuff which maybe people are also maybe some people also don't don't like so much. Um but I think it's at least I yeah I think it's a it's a helpful metaphor still. Fair enough. Okay. So some people do study program equilibria. Um and so just to set up the setting for your papers before before the appearance to the world um of robust program equilibrium, what did we know about program equilibria beyond the like these simple programs that cooperate if your s source code is mine? >> Yeah, so we do have we have some I guess we have some characterizations of the kind of equilibria in general that are allowed by these uh syntactic comparisonbased programs. Not sure how much to go into that at this point, but yeah, may maybe maybe we'll get into into this later. >> I mean, my my understanding is it's just like I I think I can do this quickly. My understanding is basically like uh if any equilibrium that's better that's better off for all the players than like unilaterally doing what they want you can get with program equilibrium because you can just be like or maybe maybe it's like you have to have punishments as well but you know something roughly like this right you can have programs being like you have to play this equilibrium if you don't then I'll punish you and like uh just write a computer program saying like if you do this if if you're equal to me then like if you're equal to me and therefore play this equilibrium then I'll play this equilibrium. If you're not then I'll do the punish action. >> Yes. Yeah, that's Yeah, that's basically right. Is is it only basically right? >> I mean it's so No, I think it's I think it's basically right. Like I think it's I think it's fully right. I mean it's sort of like in the way it's it's sorry the basically it's just like in the in the way that all kind of uh natural language descriptions Yeah. Um yeah it's it's it is these Yeah. you can you can get anything sort of that is better for everyone than what they can get if everyone punishes them, right? Which which might be quite bad, right? Like so for example in the prisoners dilemma um like we >> like we had this this nice story of how you can get mutual cooperation but you can also get >> I don't know one one player cooperates 60% of the time, the other player cooperates 100% of the time. Yeah. And the reason why the 100% of the time cooperator doesn't cooperate less is that the the 60% cooperator kind of says like well yeah if we're if we're not both submitting the program that plays this equilibria this equilibrium I'm going to always defect right. So like in some like in the previous you can get anything that is at least as good as mutual defection for both players. So you like in some sense like kind of kind of like almost everything can happen. I mean it can't happen that like one player cooperates all the time, the other player defects all the time, right? Because then you would you would the cooperator would always want to defect. Um but yeah, I mean that's sorry that's sort of the basic uh the basic picture of what's what's going on here. Um yeah so this is so so that has been known and then post like post uh tenant holds which is like one of these one of these papers like I think the paper that termed the coin program equilibrium and and gave this syntactic comparisonbased program and the this yeah folk theorem as it's called of what kind of thing things can happen in equilibrium after that uh yeah most most papers have focused on this sort of how how do we make this more rust idea And in particular the yeah what existed prior to the robust program equilibrium paper is um are these papers on making things more robust by having the programs try to prove things about each other. >> Yeah. >> So here's like a here's maybe the simplest example of this that like one doesn't need one doesn't need to like know crazy logic for. >> Yeah. Um so you could write a program like in the prison slimmer you could write a program that tries to search for proofs um uh of the of the claim if this program cooperates the other program will also cooperate. >> Okay. >> Um so you try so your program your program is now very large right it has this like proof search system. It somehow it can it can find proofs about programs. Yeah. >> Um but basically you can still describe it like relatively simply as I I I I try to find a proof uh that if I cooperate the opponent cooperates and um and then I cooperate otherwise it affect >> um and it's sort of like it's not that difficult to see that like this kind of program can cooperate against itself right because if it faces itself it's relatively easy to prove that it will >> uh that if I cooperate the opponent will cooperate right because the statement is just a like It's uh yeah it's an imp implication where like both sides of the implication arrow say exactly the same thing. >> Yeah. And at the same time this is this is more robust because it's like this will be robust to uh changing the you know just changing the spaces and so on right like if I like you can it's relatively easy to prove you know if this program outputs cooperate then this other program which is the same except that it has the spaces in different places or like switches things around in some way that doesn't really matter that this will also output that thing also. also output corporate. Um so uh so this so this is a a basic kind of proof- based approach that will work. Um yeah this was so I think the the first paper on this is by um yeah baras uh atal um I think there are like two versions of this which have different first authors which is a little bit confusing. I think one one of is is is on one of them Barus is the first author and on the other one it's uh Laviku I think or like the I think I think I think it's an American uh uh so probably the a less French pronunciation is correct but um >> I I actually think he does say lavakar >> oh okay >> I think I'm not I'm not 100% certain write in Patrick and tell us um >> yeah so so those papers first proposed these proof-based approaches, they actually do something that's more clever and h where it's much harder to see why it might work. >> Yeah. >> So instead, so I described a version you try the thing that you try to prove is if I cooperate, the opponent will cooperate. >> Yeah. >> They instead just have programs that try to prove that the opponent will cooperate. So you just do if I can prove that my opponent cooperates, I cooperate else effect. And this seems this is like much less intuitive that this works, right? You would you would intuitively you would think like surely this is like some weird infinite loop, right? If if this faces itself like if I can like you know I'm I'm going to think like what does the opponent do? Um and and then well to think about what my opponent will do to prove anything about them well I need to okay they'll try to prove something about me right and you run into this this infinite circle like you you you would think that this like basically the same as the the sort of one like one very naive program that you might write is just like run the opponent program if it cooperates cooperate otherwise defect right and this really does just run in circles. Yeah. Um, and you would think that just doing like proofs instead of this running the opponent program um that this that this you have the same issue. It turns out um that that this does um that this you can find these proofs. Um yeah, which follows from a sort of like somewhat obscure result in logic called Loops theorem. um which is it's a little bit related to to good second incompleteness theorem. Yeah, I think probably like it's I don't know with loops theorem it's like relatively easy to prove but kind of like hard to like the I don't know it's like a very it's a very like you kind of need to just write down proof and then it's sort of relatively simple but it's hard to I don't know it's hard to give an intuition for it. I think >> I guess it's also it's one of these things it's like hard to state unless you're like careful and remember. So my okay I've I've tried to write it down. It's like if you can prove that if you can prove that a proposition would be true. Okay. Okay. Take a proposition P. Mhm. Lub's theorem says that if you can prove that if you could prove P then P would be true then you would be able to prove P. >> Mhm. >> So if if you can if you can prove that the provability of a statement implies it's truth then you could prove the thing. And the reason that uh this is like sort of non-trivial or something is it turns out that uh you can't always prove that if you could prove a thing it would be true because uh you can't prove that your proving system works all the time and you can construct funky self-reerential things that uh work out. So so unless I have messed up that is Lob's theorem. >> Yes. And so my recollection is the way it works in this program is like so basically you're checking if the other program would cooperate like like imagine we're both uh these uh defect unless proof of cooperation programs right like I'm like okay well I want to check if you would cooperate given me if you would cooperate given me is the same as if I would cooperate given you. So, okay. So, so it it's here here's a thing that I definitely can prove. If you can prove that if I can prove that I cooperate, then you cooperate, right? But crucially, the I and the U are actually just the same because we're we're at the same program. So, if it's provable that if it's provable, then we cooperate, then we cooperate. Lo's theorem tells us that we can therefore conclude that it is provable that we cooperate. >> Yes, >> therefore we in fact cooperate. And my understanding is like so what do we actually do? I think we prove L's theorem and then apply it to our own situation and then we both prove prove that we both cooperate and then we cooperate. I think that's my recollection of how it's supposed to go. >> At least that that would be one way. >> Yeah. Yeah. I suppose there might be even shorter proofs. >> Yeah. >> But yeah. Yeah, that is that is basically correct. Um yeah, good good recollection of of of the papers. >> Yeah. Well, there there was a there were a few years in Berkeley where like uh every couple weeks somebody would explain lobster theorem to you and like talk about lovian cooperation and uh it was uh so eventually you remembered it. >> Okay. Nice. Yeah. Yeah. It's a Yeah, it's a very nice uh Yeah, it's a I think it's a very nice idea. I I actually don't know how they like how they made this connection. I mean also loops theorem is relatively it's relatively obscure I think in part because it's sort of like it doesn't prove that much more than good second incompleteness theorem right like it's a kind of a result like like if you can like so good incompleteness theorem is like you can't like a log like a logical system can't prove its own consistency like here it's sort of this the same thing right like you can't prove >> um you can't prove like if I can prove something it's true without just being able to prove the thing. >> Yeah. >> Um and so uh and I think that's probably one reason why loops theorem isn't isn't very widely known and sort of like for this it's sort of like I feel like it's sort of a result that like for this thing is like happens to be like kind of exactly the thing you need, right? Like once you once you have it written down um it's the this corporation property like yeah follows almost immediately. Um but uh but but connection but yeah how how did they >> I think I think I know this or or I have a theory about this. So originally but so before they were talking about lobbyian corporation there was this uh lobbyian obstacle or lost obstacle yeah to selfrust right where like they were like ah the problem like you might want to say like oh I'm going to like create a successor program to me and like if I can prove that the successor program is going to do well then or like I don't know all the programs are going to be like you know if I can prove a thing is good then I'll do it and can I prove that like a program that I write is going to be able to do stuff and It's a little bit rough because if I can prove that you could prove that a thing is good, then I could probably prove that the thing was good myself and so why am I writing the >> So, so maybe this just caused Lur's theorem to be like on the mind of everyone. I don't know. I have this theory, but I I don't think I've heard it confirmed by any of the authors. >> Okay, it's a good it's a good theory, I think. >> Okay, so so we had this Lian cooperation idea floating around. Um and this is like one thing that was known um before uh these these papers we're about to discuss. Uh is there any anything else that uh important? So there is I think there's like yeah there is a little bit more kind of like extension of this Lubian idea. So I think there's like a like one we Okay. So one weird thing here right is that yeah we have these programs like if I can prove this then I cooperate. Um, of course, whether I can prove something is not is not decidable, right? There's not like an algorithm that like tries for 10 hours and then it gives up and like right um like that's not what I can like provability would normally mean. Yeah. Um and so there's a there's a paper by Andrew Critch from I think 2019 um that shows that well actually um this like loops theorem still works um if you if you consider these sort of like bound like you you try for like with like a given amount of effort like specifically like I don't know you try like all proofs of a given length I think is the is the constraint and it sort of shows that some version of loops theorem still holds and it's still enough to um to get this lubian in cooperation if um yeah if the two players kind of like consider proofs up to like a kind of long enough length. >> Yeah. >> Uh they can still >> and it doesn't have to be the same length. >> Yeah. It doesn't have to be the same length of >> it just has to be the length of that paper. >> Yeah. Right. Yeah. >> Yeah. Which is which is great, right? Um >> very fun result. So So there's a loan incorporation, there's parametric bounded loan incorporation. Um anything else of note? Yeah, I think there's not I Okay, I think one other thing that I think is kind of interesting um I mean this is not really like a important fact, but I think it's like a nice an important thing to understand >> is that the um like the for the Lubian bots, it's sort of like it the like it matters that you check for that you try to find a proof that the other player cooperates rather than you that rather than trying to find right? >> Uh a proof that the other player defects. >> Um and um and the same is true um the same is true for the for like the defect case I guess. So uh sorry the sorry the this like implication case that I described like if you if you if you f if you try to check like is there a proof that if I defect the opponent will defect. Um I mean not not sure why you would do that. Um yeah, but uh well you might do it um based on like >> or I I mean you can imagine similar things of just like okay if I try and punish you will you or if I defect will you cooperate with me naively like a sucker? >> Um if so then I'm just definitely going to defect. Right. >> Um >> right though then I guess you would you would not like you would check for some other property >> or or you would check like if I defect will you defect? If so, then I'll cooperate like maybe that would be the program. >> Yeah, maybe that is even the more sensible program and I guess that uh they are not sure whether this uh cooperate against itself there. So in the um like in the in this there's this yeah >> wait it must cooperate right because like suppose >> okay let's let's think um >> so suppose we're the same program right then it's basically like if provable defect if and only if provable defect then cooperate else defect but provable defect if and only if provable defect it's the same like you don't have to go into the like >> like you can just see that it's the same expression on both sides >> right I agree this will cooperate yeah this is not an equilibrium though, right? Like if the opponent just submits uh defect bot, you're going to cooperate against it, right? >> Yes. It's not an It is a program. >> It's not an equilibrium. >> Yeah. >> Okay. >> Okay. I assume >> I I got us off I got us off track. I fear. >> Oh. Oh, yeah. But but you were saying that you have to like like you kind of want to be proving the good case, not the bad case. >> Yeah. So I guess for the yeah maybe let's do the the norm like the the version from the paper like if I pro if I can prove that you cooperate I cooperate otherwise I defect. Yeah. Right. Like if you think if you think about it like in this program it doesn't really matter that like cooperate mutual cooperation is sort of the good thing. Yeah. >> And mutual defection is the bad thing. Like ultimately it's just we have two labels corporate and defect. Yeah. Could call them A and B instead. And it's just like if if I can prove that you output label A, I also output label A. Otherwise, I'll output label B. And sort of regardless of what these labels are, this will result in both outputting label A. And if label A is happens to be defect rather than corporate, these will defect against each other. So it sort of matters that you >> like you need to Yeah. You need to sort of like I don't know try the good thing first or something like that. >> Yeah. Yeah. It's I guess like maybe the most intuitive way of thinking about it. Uh which I haven't thought about it a ton so this may not be accurate but it feels like it's like okay you're setting up a self-fulfilling prophecy of like or I if the other person happens to be you then you're setting up a self-fulfilling prophecy and you want to set up set up the good self-fulfilling prophecy not the bad self-fulfilling prophecy. >> I think this is true in this setting. My impression is that there's also like decision theory situations where you want to like try and prove where you really care about the order in which you try and prove things about the environment. And I forget if self-fulfilling prophecy is like the way to think about those situations as well, even though they're conceptually like related. We can perhaps leave that to the listeners if uh if it's too hard to figure out right now. Um, okay. So, now now that we've uh known this like sad world that's confusing and chaotic, uh, perhaps perhaps we can get the light of your papers. >> Okay. I mean, I should say I I really like the the proofbased stuff. I mean, yeah, we can talk a little bit about what the what what maybe the upsides and downsides are. I mean, yeah, it is it is confusing. I would think the main Yeah, I would think that one one issue is is with it is that sort of like in practice, you know, what what programs can one really prove things about, right? A little bit. >> Yeah. I I mean, my my intuition is that it's like the point of that work is supposed to be like, oh, like it seems like it's supposed to be modeling like cases where you have like good beliefs about each other that may or may not be exactly proofs and you hope that something like Lur's theorem holds in this like more relaxed setting, which it may or may not. I don't I don't exactly know but yeah I agree. I yeah I also kind of view it this way which yeah sort of like a more metaphorical uh way. >> Um yeah I mean there's some some distance between the between the mathematical model and the the like the actual the actual way it would work then >> um but I want to hear about your paper. >> Right. Okay. So now let's get to let's get to my uh my paper. So my paper is on whether we can get these coop cooperative equilibria not by trying to prove things about each other >> but by um just by simulating each other. >> Yeah. Um, and I already like I already mentioned that like there's this sort of super naive but kind of intuitive approach that like you like you would like to run the opponent against like you you'd like to do like run the opponent with myself as input. See if they cooperate. If they do cooperate otherwise defect, right? is sort of this like very obvious like intuition maybe like from from like I don't know tit for tat in repeated games that uh you want to you want to reward the other player for cooperating and get get a good equilibrium that way. Uh and the problem with this of course is that it doesn't halt right if both players do this. Y and I guess this would work if we have if you play this sequentially, right? Like I mean we talked about the Stuckleberg stuff earlier, right? Like if I just play submit a program first and then you submit a program second, then it would work for me to submit a program that says run your program uh cooperate if it cooperates defects a defect if your program defects and then you would be incentivized to cooperate. >> Yeah. >> But if Yeah. Yeah, if both players play simultaneously, infinite loop. Uh, so kind of doesn't work. >> If we had reflective oracles, then it could work depending on the reflective oracle. But that's a whole other bag of worms. >> Yeah, I guess reflective oracles. Uh, yeah, I probably shouldn't get into it, but it's sort of like another model that's like a little bit maybe it's also a little bit like I don't know, maybe it's like a little bit sort of in between the proofbased stuff and the simulation simulation stuff. Yeah. Um, >> at any rate, >> yeah, so it turns out there's like a very simple fix to this issue, uh, which is that you can just, um, like instead of just always running the opponent and cooperate cooperating if and only if they cooperate, you can you can avoid the infinite loop by at each like by like just cooperating with epsilon probability uh, and and only if that's sort of if the this like epsilon probability clause uh doesn't trigger only then do you run the other program so uh so like your program is just like you >> yeah I don't know flip a ve very biased coin like epsilon is small number right so you you check whether like some low probability event happens >> if it does you just cooperate without even looking at the opponent programs uh uh sorry program um and otherwise you do simulate the other oppo a program and you copy whatever they do. So you cooperate if they cooperate uh defect if they defect. >> Yeah. And the idea is that basically this just like it's sort of the same intuition as the I know just copy like just simulate the opponent and do the sort of instantaneous tit for tat except that now you don't run into the infinite uh this like running for infinitely long issue because um well you this like I mean you you might it might take a while but like eventually you're going to hit this uh these epsilon clauses right like if if we both submit this program then okay probably like there's some chance that I'm immediately cooperating but most likely I'm going to call your program which might then also immediately cooperate but most likely it's going to call my program again and and so on and but at each point we have a probability epsilon of kind of like uh halting uh and uh yeah and like with probability with probability one will eventually uh will eventually halt. >> Yep. And uh and this is a special case of this like general construction you have in the paper, right? >> Yeah. So this is for the prisoners dilemma in particular. Um right the where you Yeah. You have these two actions that happen to be cooperate in effect in general. Yeah. You can you can in general there are like two things that you can specify here, right? like you specify what happens with the epsilon probability and um and then the other thing that you specify is what happens if you simulate the other player. You get some action out of the simulation and now you need to kind of react to this in some way. Um and so in like the the paper draws this connection between um between these yeah epsilon grounded uh piots as they're called um >> uh and um and repeated games where you can only see the opponent's last move. >> Yeah. >> Um so it's like it's kind of similar to that where you Okay. Maybe like the the this like epsilon clause where you you don't look at your opponent. is kind of like playing the first round where you haven't seen anything of your opponent yet. >> Yeah. >> Um and I guess in the prison slma it's sort of like yeah there's this well-known tit for tat strategy which says like you should cooperate in the beginning and then at each point you should look at the opponent's last move and copy it. So cooperative day cooperate. Um but in general you could have yeah you could have these yeah myopic uh like strategies for the for these repeated games where you you do something in the beginning and then at each point you look at the opponent's last move and you react to it in some way. Maybe I don't know do something that's equally cooperative or maybe something that's like very slightly more cooperative to kind of like slowly get towards cooperative uh outcomes or something like that. Um um so you could you could have these strategies for the iterated prison uh iterate for repeated games. Um and you can you can turn any of these strategies into programs for the uh for the program game. So one thing that's um that I just noticed about this space of strategy. So this is strategies that only look at your opponent's last action, right? >> Yes. Yes. So in particular like um so there's this other um there's this other thing you can do which is called like win stay lose switch um where like you know uh if if you cooperated against me then I just do whatever I did last time. If you defected against me then I do the opposite of what I did last time. And it seems like um well uh it seems like this is this is another reason that uh your next another thing that your next paper is going to fix but that in this strategy it seems like I can't do this right. Yes. Yeah. It's very Yeah, it's really very restrictive. Um Yeah, you can really Yeah, you I mean Yeah, ultimately you can I mean most of the time you're going to see one action of the opponent, you have to react to that somehow and that's it. Yeah. But but it but it's a cool so it's this nice idea and so it's basically this connection, right, between like if you can have a good iterated strategy then you can write a good computer program to play this like uh mutually transparent program game, right? Yeah, how much do we know about good iterated strategies? >> That is a good question. So I think so for the program uh for the for the iterated prisoners dilemma, there's sort of a lot about this. Um like there's sort of yeah there are a lot of these tournaments for the iterated prisoners dilemma. Um I'm not sure how much there is for other games actually like um yeah you might I don't know like the yeah I don't know iterterated stack hunt or something like that. Um I guess maybe yeah maybe for a lot of the other ones it's sort of too too easy or so. Um >> there yeah I mean there's some literature and I mean you you can check the paper like there's some there are like various kind of notions that people have looked like looked at like exploitability of various strategies which is sort of like how much uh how much does does the other player kind of like h how how much more utility can the other player get than me, right? Um, if I play the strategies like for example tit for tat, if the opponent always defects, um, you're going to get slightly lower utility than them because in the first round you cooperate and then they defect and then in all subsequent rounds both players defect. Um, and so so the so it's like very slightly exploitable but not very much. Um so there are these notions that have been studied and like yeah in my paper I kind of like transfer these notions like you get for if you if you take a strategy for the iterated prisoners dilemma and you it's or like for any for any repeated game and it's like has like some amount of exploitability well the analogous sim like uh epsilon grounded pybot strategy h has like the the same amount of exploitability. >> Yeah. Um, but yeah, I'm not sure. I I think this is this is also kind of an interesting question in general. like sort of like how how much like qualitatively different stuff is there even like in this purely epsilon grounded pybot space like if you just if all you can do is look at the other like one action of the opponent react to the to this action like how much more can you even do than things that are kind of like this sort of tit fortat like I mentioned this like being like in more complex games maybe you want to be like slightly more cooperative so that's sort of the idea is that if who like I don't know you you I don't know like you after like a bunch of simulations you eventually become very cooperative or something like that. >> Yeah. >> Um >> okay I I have a theory. So okay in my head I'm thinking like okay what's the general version of this >> and I can think of two two ways that you can generalize right? >> So here's what I think here's what I'm imagining you should do in general. Okay, you you have a game, right? First you think about, okay, what's the good equilibrium of this game, >> right? And then what do I want to do if the other person doesn't play bowl, right? Seems like there are two things I could do if the other person doesn't join me in the good equilibrium. Firstly, I could like do something to try and punish them, right? And secondly, I can do something that will like make me be okay, like be good enough no matter what they do, right? Like I'm not exactly Like I don't exactly know how you formalize these but my guess is that you can formalize something like these and my guess is that these will look different right and so you can imagine saying like okay with epsilon probability I do my part to be in the good equilibrium and then um the rest of the time I simulate what the other person does. If they play in the good equilibrium I play in the good equilibrium. If they don't play in the goodie cooler room, then depending on what I decided earlier, I'm either going to like punish them or I'm going to do a thing that's fine for me. Or you can imagine that I like randomized between those. Like maybe there's some like best of both worlds thing with randomizing. I don't exactly know, but um that okay. Do you have a take on that? I mean, there's at least one other thing you can do, right? which is trying to like try to push like try to be like slightly more cooperative than them in the hope that like you you just kind of like >> right >> like by doing this repeatedly like if you like imagine like the repeated game, right? At any given point, you might you might want you might want to be like like try to be a bit more cooperative in the hope that like the other person will figure this out that this is what's going on and that you're always going to be like a little bit more cooperative than them. >> Yeah. >> Um and that this will kind of lead you to the the good equilibrium or like to a a better equilibrium than than what you can get if you uh if you just yeah punish. I mean punish usually means you do something that you wouldn't really want to do. you just do it to incentivize the other player. Um or yeah or even the kind of like okay well what you're going to go go and do whatever but I I'm just going to do something that makes me makes me okay. I mean is is the like so to be more cooperative than the other person thing. I feel like that's already part of the strategy. So like okay here's what the thing I could do with epsilon probability do the like good equilibrium then uh simulate what the opponent does. um if they do the bad if they do the good thing if if they're in the good equilibrium then I join the good equilibrium if they don't join the good equilibrium then with epsilon probability I be part of the good equilibrium and then otherwise I do my like other action like that with with epsilon probability for being slightly more cooperative like you could have just folded that into the initial like probability right >> right I guess the difference is that um you can like you you can be sort of like you can be epsilon more cooperative in like a deterministic way, right? Like like um like with the this like epsilon probability thing. >> Yeah. >> Like you're like some of the time you play your favorite equil or like the equilibrium that you would like to play. Yeah. >> Um but um the like the this alternative proposal is that like you you always become slightly more cooperative, right? Which is um I mean I'm not I'm not sure how these things play out. I would imagine that like for for characterizing what the equilibria are. >> Um probably probably all you need is actually the the punishment version like the >> but but I would imagine that if you want to play like some kind of robust strategy, you would sometimes do this like move into like a slightly more cooperative direction or something like that. I guess another like another thing to think about is that you could have like you could have all of these games where there are like lots of ways to cooperate and they sort of like vary in how good like in how like they distribute the gains from trade or something like that, >> right? And then then there's sort of a question of what like what exactly happens if your opponent is like I don't know they play something that's kind of cooperative but sort of like in a way that's like a little bit biased towards them. Um, yeah. In which case, um, I guess maybe you would view this as just like a form of p punishment. If you then say like, well, I'm gonna stay like somewhat cooperative, but I'm going to punish them enough to to make this not worthwhile for them or something like that. >> Yeah. If if there's different cooperative actions that are more or less cooperative, then it definitely it definitely makes sense. Okay. But at the at the very least, I think there are at least two strategies in this space. Although maybe the I don't know if both of them are equilibria to be fair. Um I I think the so okay there there are a few things that about this strategy that I'm like interested in talking about. So, the first thing that seems kind of strange is suppose like I'm this like uh you know uh you know I I we're we're both playing the same like kind of uh tit for tat but in our heads strategy right >> the time that it takes us to eventually output something is order one on epsilon right uh on average because like with epsilon each round with epsilon probability we finish and then like you know it takes like one epsilon rounds for that to happen Right. >> Uh yeah, I think that's roughly right. I think it's I mean it's like a geometric series, right? Like is it uh Yeah. Yeah. I Yeah, I think it's like roughly one over which is very close to one. Yeah. So like um that strikes me as a little bit wasteful, right? in in that like so so the cool thing about the um about the Lobian version was like the time it took me to like figure out how to cooperate myself with myself was just the time it took to like do the proof of lobes theorem uh no matter like how like it was sort of this constant thing right whereas with like the epsilon version like the you know the the smaller the epsilon is the longer it seems to take for us and and we're just going like back and forth right we're going back and forth and back and forth. I have this intuition that there's like something wasteful there, but I'm wondering if you like agree with that. >> I think yeah, I think it's basically right. Like I think I think it basically is like I like I don't know. I share I think I share the the intuition that it's sort of Yeah, there's something like Yeah, especially if you have like a very low epsilon, right? There's there's a lot of just kind of like doing the same back and forth thing uh for a long time without Yeah. without getting any anything out of it. Um I suppose like one thing is that yeah you could try to speed this up right like if you um like if I so let's say I run your program right then instead of like just running it in the naive way I could uh I could do some analysis first like I can run this sort of like in a way that like a compiler would like if if you have like a compiler of computer program it might be able to like do some optimizations And so like maybe I could like I could analyze your program, analyze my program and I could tell like okay what's going to happen here is that we're going to uh we're going to do a bunch of like nothing uh like until this like epsilon thing triggers. So really instead of like doing like this actually calling each other, we just need to like we just need to like sample the kind of depth of simulations according to this geometric distribution like the distribution that you get from uh from this like halting it with probability epsilon at each step. >> Yeah. Um, so you could if you could like you could do this analysis, right? Like if you if you're if you expect especially if you expect that your opponent will will be an epsilon grounded fairbot, you might like explicit explicitly sort of like put in your compiler or whatever uh something to like check whether the opponent is like this epsilon grounded fairbot. And if so uh we don't we don't need to we don't need to do uh this like actually calling each other. We just need to sample the depth and then um and then you just need to yeah then then you then like in some sense the computation that you need to do is like sample the depth then sample from what the opponent >> well well what whoever is at that like whoever holds at that point >> um like sample from their like sort of base distribution like their their like blind distribution. Yeah. um and then sort of propagate this through through all of the like the function that both players have for uh taking a sample of the opponent strategy and generating a new action. Um and if if the if this is all very simple then you can you like you can you can show that like the in principle like your compiler could say okay I'm here doing all of this like for the epsilon grounded fairbot in partic like the sorry the epsilon grounded fairbot is the version way like the version for the prison slma yeah >> um uh you in principle like your compiler could directly see okay what what's going to happen here well we're going to sample from this geometric crypto distribution then cooperate will be sampled and then a bunch of identity functions will be applied to this. So this is just cooperate without needing to do any >> uh like any actual uh running this like doing recursive calls by uh you know something something with a stack and so on like you don't probably you don't actually need any of this. >> Yeah. Yeah. I yeah I do think yeah there's something like intuitively very compelling about like okay like if I can can prove that like you know the good thing happens or whatever then like do the proofbased thing if I can't prove anything then do the simulation stuff seems intuitively compelling I imagine you probably want to like do some checks if that works on the proofbased side depending on the strategy you want to implement >> I mean the thing I'm proposing is not is not to have this like fall back uh like the proof fallback but that you like you do you do you always do the epsilon grounded fairbot thing for example or the epsilon ground pybot you just >> like you're you're like like instead of like calling the opponent program in a naive way where you like you actually run everything you you throw it in like this clever compiler that analyzes things in some way and maybe this compiler it can do like >> some specific optimizations but it doesn't like it's not a fully general like proof searcher or anything like I mean I mean it's it's checking for some proofs, right? >> Yeah. It's it's checking for some specific kinds of proofs. Yeah. Which I guess is like I mean that's how like modern day compilers I assume work, right? Is that they they have they like understand like a specific kinds of optimizations and they can make those. Yeah. >> Um but they don't >> I mean they don't have a fully general uh proof search or anything like that. >> I I guess I was imagining a slightly like sorry when you said that I was half listening and then half thinking about a different thing, right? which is like you know you could imagine like epsilon grounded fairbot which is uh okay first if your source code is equal to mine then cooperate else if your source code is the version of epsilon grounded fair that doesn't first do the proof search then cooperate else with probability probability epsilon cooperate probability one minus epsilon do what the other person does right >> um and that would uh like that probably so that particular version probably doesn't actually get you that much because like the other person like added some spaces in their program and then I'm like ah but you could do some proof stuff instead of there. Yeah, I guess there are a few possibilities here. >> Yeah. >> Um but it does seem like does seem like something's possible. >> Yeah, I for what it's worth in general I think these different kinds of uh like yeah these different kinds of ways of achieving this more robust program equilibrium um >> they are compatible with each other. Like if I if I do the epsilon grounded fairbot and you do the Lubian bot um they are going to cooperate with each other. >> Um >> they you're you sure? >> I'm pretty sure. Yeah. >> Yeah. Okay. You you've probably thought about this. >> Yeah, I So yeah, I have like a paper about about like I mean it's not it's not a real paper. It's sort of like a note on this. Uh yeah cuz like like if you're let's say you're the Liubian bot so you pro try to prove >> um like maybe maybe let's take like the simplest version so we don't need to go down the lips theorem path again let's take the simplest version which is just um can I prove that if I cooperate you cooperate um yeah you can like if you're the Lurian bot and I'm epsilon grounded pybot you sorry epsilon grounded fairbot you can prove that if you cooperate I will cooperate Right. >> Because well I'm epsilon times >> sorry can you say that without using you and I? >> Okay. Yeah. So okay. So am I allowed to say I submit a program that's you are. Okay. So I submit a program. Yeah. >> That says uh that is the epsilon just the epsilon ground at fairbot. Yes. So just uh yeah epsilon probability cooperate otherwise simulate you and do what you do. Yeah. And your program is if uh if it if it's provable that if this program cooperates the other program cooperates then cooperates and otherwise defect. >> Yeah. >> Okay. So like your so let's think about your program >> the proof based one >> the proof based one. Yeah. Um so your program will well it will try to prove if it cooperates Casper's like my program will uh okay will the epsilon grounded fairbot will cooperate. >> Okay so proofbased program is trying to prove if proofbased program cooperates then sampling program cooperates and it will be able to prove that I think the other implication is slightly trickier but maybe you only care about the first implication. Yeah. Sorry. What is the what is the other implication >> that if the sampling based program cooperates then the proofbased one will cooperate. Actually, maybe that's not so bad because >> but do you actually need this? Like you can you can prove that the the proof-based program Yeah. will >> it will succeed in proving this implication. >> Yeah. >> Right. And it will therefore cooperate >> and that's how it proves that it will do it in the other direction. And well I mean that's how one can then see that the epsilon grounded fair will also cooperate because it will well with epsilon probability it cooperates anyway and with the remaining probability it does whatever the proofbased thing does which we've already established is to cooperate. >> Yeah. >> Yeah. Sorry, does this leave anything open or >> So, the thing I was curious about is Oh, yeah. I think I was just thinking about a silly version of the program where the proof-based thing is checking if my if I can I prove that if I cooperate. So, so can I prove that if my opponent will cooperate, then I will cooperate. But I think you wouldn't actually write this because it doesn't make any sense. >> Yeah. No, I think that Yeah, that seems that seems harder though. I don't know. Maybe maybe if we think about it for two minutes, we'll we'll figure it out. But yeah, yeah, I I think one wouldn't submit this program. >> Yeah. Yeah. Yeah. I I next want to ask a different question about um this like uh tit fortat based bot. So this uh this bot is going to cooperate against cooperate bot, right? The the bot that always plays cooperate. >> That seems pretty sad to me. >> Yeah. >> Right. Uh I'm wondering how sad do you think that this is? >> Yeah. I'm not sure how sad. So, okay, I have two two answers to this. The first is that I think it's not so obvious how sad it is. And this the second is that >> um I think this is a relatively difficult problem to fix. So on the like how sad is this? >> Okay. So I agree that like it's sort of like I don't know it sort of depends a little bit on like what you expect your opponent to be, right? If you like if you imagine like you run around the world and like you're the sorry you're this program uh you've been written by Daniel and you run around the world and you face opponents and most of the opponents are just inanimate objects that weren't created by anyone for strategic purposes and now you face the classic like rock that says cooperate on it >> and it's just it happens to be a rock that says cooperate. Uh right. Yeah, you should obviously yeah, you don't really want to cooperate against that. >> But like here's another possibility. So let's say you do like we play this program equilibrium game uh literally um and you submit your program, right? Um and you know that the opponent program is like written by by me by by Casper who probably thought about some strategic stuff, right? Yeah. then okay it could be that I just wrote a corporate bot right uh and that you can now get away with defecting against it but maybe you could also imagine that like maybe there's like something funny going on and so for example one thing that could be going on is that I could um like here's a completely or like a pretty similar scheme for achieving cooperation in the program equilibrium game which is based on like not the programs themselves mixing but the players mixing over what programs to submit And so I might >> mixing meaning randomizing. >> Yeah, randomizing. Yeah, very good. Um, so I might randomize between uh the the program that just cooperates. So corporate bot, >> the program that >> cooperates if and only if the opponent cooperates against corporate bot. So it's sort of like a I don't know like a second order corporate bot or like something like that. >> Yeah. >> And then you can kind of imagine how this goes on, right? for at at each like each of my programs is sort of like sort of some hierarchy of programs that check that you cooperated against the one one down one lower the uh the list like in some sense this is kind of similar to epsilon grounded to the epson grounded uh fairbot um I guess in that >> like at each stage you like kind of like you I don't know you can look at my program and like okay it would be like maybe maybe I could just affect or something like that. But the problem is you might be in a simulation of the of the programs that are like higher in the um in the list. >> Um and so so so here you would want if I submit this distribution, you would still want to cooperate against my corporate bot of course. Um so that is one reason to want to cooperate against corporate bot. Yeah, I guess it's tricky because like like it suddenly means that it really matters how like which things in my environment I'm modeling as like agents and which things in my environment that I'm modeling as like >> non- aents, right? Because like >> so in my actual environment there are like I think there are many more non-agents than there are agents. Yeah. >> And and then like not only do I have to so take this water bottle, right? like not only do I have to like um model it as a non-aggent, but I've also it seems like maybe I've also got to be modeling like what are the other things it could have done if physics were different, right? And then like it feels like I'm going to like like it seems like if I have this sort of attitude towards the world, a bunch of like bad things are going to happen, right? And and also like if if I'm in a strategic setting with like you know other agents that are trying to be strategic like I think I think it does actually I think you do actually want to be able to say things like hey if I defected would you cooperate anyway? In that case I'll just defect. Um but if your cooperation is dependent on my cooperation then I'm going to cooperate. Um which is hard. It's hard to do with this construction because I'm checking two things and that like explodes into a big tree. But like I I think I this seems to me like something that you do want to do in the program equilibrium world. Um so yeah I I guess those are two things. I'm wondering like what your takes are. Yeah. So I mean I agree that it yeah it would be nice to know how to how to do the like this sort of like yeah could I like for this given opponent program could my defecting like would it make the opponent defect uh and um and so like this like I I think like a program that exploits corporate bot and and cooperates against itself in some robust way I agree that this would be desirable. I guess we can can say more about like to what extent this is feasible. >> Um I I do think that basically like in some I think in some sense one does just have to do the you know form the beliefs about like what I don't know what the water bottle could have been and things like that. Like I guess with a water bottle um >> I don't know like I mean it's sort of like a weird weird example but like with a water bottle I guess you would yeah you you would have to think about like is it is there someone who has like a reason or like like do you have a reason to believe that there's someone who's like simulating what you do against the water bottle and like depending on that does something right and like I guess in the in the ca in the strategic setting where like you know that the opponent program is submitted by by Casper or like by someone who like knows a little bit about this literature, >> you you just have like a very high credence that like if you face a corporate bot probably something funny is going on, right? Um >> yeah, >> you have a high credence that there are like some simulations being run off you well your program uh that check what your opponent pro what your program does against various opponents. And so you have to you have to optimize for that case much more than you optimize for the case where your opponent is just a corporate bot. Whereas with a water bottle, you don't really have this, right? Like you like I don't know why why would someone simulate like okay the water bottle could have been I mean people really did design this water bottle by thinking about how people would use it, right? So so like if I'm Yeah, I think I have a few thoughts there. Personally, like if I'm just naively, like did people change how this water bottle would work depending on how like how other people would like interact with it? Like that that's just true. I mean, they they didn't get the water bottle itself to do that. So maybe that's the thing I'm supposed to check for. It's also true that like if you do go to real and maybe this isn't like a great example because so if you go to real like um iterated um you know mutually transparent prisoners dilemas, right? like they people do actually just write dumb programs in those and it's possible that like okay these are played for like 10 bucks or something. Um and that's that's why people aren't really trying but in fact like some people are bad at writing these programs. Yeah. >> And >> you want to exploit those programs, right? Yeah. And I guess I also have this issue which is like okay so it seems like then what's going on is like my overall like program strategy or something is first check if I'm in a situation where I think the other program was designed to like be you know designed to like care about what I am going to do then cooperate otherwise effects. Um, maybe this is not so bad in the simulation setting. In in the proofbased setting, this would be pretty bad, right? Because now it's like much harder to prove nice things about me. In the simulation setting, it might just be fine as long as we're really keeping everything the same. Yeah, maybe this is an advantage of the simulation setting. Actually, I I don't really know. >> Yeah, sorry. I'm not sure I I fully I fully follow that. >> Okay. Like I I took your proposal to be um okay like the thing you should do is you should like figure out if you're in a strategic setting where the other person is like basically definitely not going to submit a cooperate bot. >> Mhm. where I'm I'm imagining myself as the computer program and maybe maybe this is like kind of different to what you were saying but I was imagining that the program was check if the other person is uh going to you know check check if the other computer program was plausibly um strategically designed >> then >> if so then do epsilon grounded fairbot bot otherwise do defect bot and there's a concern that like for for example One concern is like um different people write their programs to do this check in different ways and they end up like uh one of them ends up being wrong. Um maybe this is not a huge issue. I don't know. It feels like it adds complexity in a way that's like a little bit sad. I could imagine that I guess for the proof- based ones the challenge is that they like they need to be able to prove about each other that they assess the like whether they're in a strategic situation. Yeah. like they they need to assess this consistently or something like that. >> Yeah. Yeah. >> Well, also the the more complicated your program is, the harder it is for other people to prove stuff about you. And so you like >> like one thing you kind of want to be do if you're a proof-based program in a world of proof-based programs is like be relatively easy to prove things about. >> Yeah. >> Well, well, depending on how nice you think the other programs are, I guess. >> Yeah. Um Yeah. Yeah, I guess in a like for example in the like you brought up the the tournament case and there like I mean in practice I think yeah you should in the tournament you should for various reasons you should mostly try to exploit these uh like corporate bots or like these like just programs that are just written by people who have thought about it for 10 minutes or who just don't understand the setting or something like that. um like in part because I don't know like the you shouldn't I don't know you wouldn't expect people to submit this like this cooperation bot hierarchy thing because there's just other things to do right so >> so so sort of like there's just like in some higher prior on these kinds of programs but you could imagine like you could imagine a v version of the tournament setting where you're told who wrote the opponent program and then your program like like distinguishes between, you know, someone who has, I don't know, publications on program equilibrium, wrote the opponent program, and then you think, okay, well, all kinds of funny stuff might be going on here, like they might I might currently be >> simulated by something that tries to analyze me in some weird way. So, like I need to think about that. Um, versus um an opponent is written by Yeah. someone who I don't know, don't want to uh I don't know podcast and I well I don't know by someone who just doesn't know very much about the setting and then maybe there you think okay most most probability mass like most kind of prior probability masses on them just having screwed up somehow and that's why their program is basically a corporate bot. M um which yeah I think probably like in these tournaments I would I would imagine that like I don't know 30% of programs are just some something that just fundamentally doesn't work doesn't do anything useful just I don't know checks whether the opponent has a particular string in the source code or something like that uh and and meanwhile very little probability mass on on these sophisticated schemes for like check whether the opponent cooperates against cooperate bot uh like in a in a way that's um that's useful Um yeah, >> there is. So I guess so we talked a little bit about like how like whether to what extent it's desirable to exploit corporate bots. >> Um there's then also the question of like how exactly to do this. Hm. Here's one more thing on on this question of whether uh yeah whe whether you need to kind of like know like what what whether the opponent is part of the environment or strategic >> which is that you can think about you can think about like the repeated prisoner slma right. >> Yeah. >> And there like I mean tit for tat you know everyone everyone agrees it's a reasonable strategy. Yeah. >> And tit for tat also cooperates against corporate bot. Yeah. >> Right. Um, and I suppose there like I would I would think there it's sort of like it's sort of analogous that you need to like tit for tat is a reasonable strategy if you if you think that your opponent is quite strategic. >> Yeah. >> And if you're the more you're skeptical, the more you should I don't know, maybe you should just be defect bot, right? Like against your water bottle, maybe you can be defect bot. Yeah. Um and and and then there's like some kind of in between area, right, where you should you should do tit for tat, but maybe in round 20 you should try defecting to like see whether what what's going on and then if they defect you can maybe be pretty sure that they're um yeah that that they're uh strategic. >> Yeah. Yeah. I I guess like the it's it seems to me like the thing you want to do is like you want to have like randomized defection then like see if the opponent punishes you and then otherwise do tit for tat but also be like a little bit more forgiving than you otherwise would be in case other people are doing the same strategy. >> Yeah. I guess one yeah one one difference between the settings is that yeah you can like try out different things more. >> Yeah. Um, which I think also leads nicely to the the the the other point which is like how how you would how exactly would you do this like punishing corporate well like sorry exploiting uh corporate bots. Um, so yeah, I do think one kind of just like a fundamental difficulty in the program equilibrium setting for exploiting corporate bots is that um it's sort of like aside from like kind of little tricks, it's difficult to tell whether the opponent is sort of a corporate bot in the relevant sense, right? Like intuitively what you want what you want to know is like if I defected against my opponent would they still cooperate and if that's the case you would want to defect right but this is sort of >> like this is some like weird counterfactual right where like yeah where you like you have all of these usual problems of like conditioning on something that might be false. Yeah. And so you might get all kinds of weird uh weird complications. So I think um like in in comparison to the to the tit for tat case where may I mean it's not clear what exactly you would do right but maybe in some sense you can try out against a given opponent you can try out sometimes defecting sometimes cooperating and seeing what happens there sort of like there's less of that in the um in the program uh program game case because you like you're one program you have there's some action that you play Um, you can maybe you can think if I played this other action, but it's sort of like it's it's a weird you get these typical like uh logical logical uh run into these typical logical obstacles. >> Although it feels like it might not be so bad. So So like imagine I have this thing where I'm saying, okay, suppose that like suppose I defected. Would you then would you cooperate against a version of me that defected? If so, then I'm going to defect. And like that's that's and in that case it seems like my defection is going to show up in the cases in which you would cooperate and therefore like by like that that counterfactual is like not going to be logically impossible, right? >> Yeah, that's a good point. Yeah. So what Okay, so I guess that's like a a very natural uh extension of let's say these proof- based bots. >> Yeah. is okay. What if you first you first try to prove if I defect the opponent will cooperate? >> Yeah, >> this will defect against corporate bots which is good. Um the question is whether this will still um what Yeah. What does this do against itself? It can't Yeah, this will still cooperate against itself, right? >> Yeah. because it can't like like if I'm asking will you cooperate if I defect the answer is no if I'm playing myself because I always have to do the same thing as myself because I'm me. >> Yeah, maybe this just works. Um I I bet there must be some paper that's checked this. Yeah, I'm also a bit I'm now also trying to trying to remember because there is a so yeah, one of these proof- based papers they do consider this like prudent bot. >> Yeah. Oh yeah yeah yeah yeah >> yeah. Which does something much more kind of hacky of like it tries to prove and there's some like logic details here. It tries to prove that um okay okay there's one there's one issue with the program that you just described that I just remember. But let's go to prudent bot first. So, Prudentbot uh just checks whether you would cooperate against um against defect bot. >> Um and then if you cooperate against defect bot, I can defect against you, >> right? >> Um like I don't know, to me this is sort of like a little bit like I mean it's natural to assume that if the opponent cooperates against effect bot, they're just like non-strategic. They haven't figured out what's going on. >> Yeah. >> And you can defect against them. But like in some sense like this is quite different from uh from this like does the like does my defection make the opponent defect or something like that. >> Yeah. Like it's both it's both the wrong counterfactual and it's like a little bit less strategic, right? >> Yes. Yeah. Yeah. the the things that I'm aware of that people have talked about are more like this where they they check these like relatively basic conditions which like you can view them as sort of checking for specific kinds of corporate bots like I guess another thing you can do is like for the epsilon grounded fair bots right >> just add in the beginning a condition if the opponent is just a corporate bot or like is if the opponent never looks at the opponent's source code at all then affect against them >> like you can add these sorts of things and Um, like I think from the perspective of of like winning a tournament, you should think a about a lot of these sorts of conditions. Yeah. And and try to try to exploit them to uh defect against as many of these players as possible. But it never feels that sort of it's not really satisfying. It feels sort of like a kind of like a trick or like I don't know some hacky thing. Whereas the thing you proposed seems more uh more uh more principled. Okay. Now on this thing I could imagine like one one issue is that to prove like when this program faces itself it first needs to prove like like I need to prove that if I defect um like basically I need to pro to like I think so one problem is always that you need to sometimes to analyze opponent programs you need to prove that some provability condition condition doesn't trigger. >> And the problem is that just from the fact that you think the probab this condition is false, you can't infer that it's not provable because of incompleteness. So I could imagine that like you can't like I can't prove that your program doesn't just falsely prove that uh you your program can safely defect against me. >> Yeah. Um because like you might think well I don't like yeah it might just like sort of like I I like my when I prove things I don't know whether uh yeah whether piano arithmetic or like whatever proof system we use whether it's consistent right and so there's always a possibility that kind of like every provability condition triggers um which means that I don't know whether your first condition triggers there is so actually for this prudent bot this also arises right like for like if I am this prudent bot >> so I try to prove >> I I first try to prove whether or like I as part of my analysis of your program I try to prove that you would um defect or like cooperate or whatever like I try to prove something about what you would do against um defect bot against defect bot yeah and for that if you're let's say you're the just this basic some some more basic like Lubian fairbot type structure then in my analysis of your program I need to conclude that you're like that your clause that if I cooperate the opponent does cooperates or like your clause like if I can prove that the opponent cooperates I need to conclude that this won't trigger >> like to prove that you don't cooperate against effect bot I need to conclude that you won't falsely prove that defect B will cooperate against you. >> Okay. >> Um which means that and and this I can't prove in PA like in piano arithmetic like or in the same proof system that you use. >> So what they actually do for the prudent bot is that I need to add >> to like I need to consider I mean they call it PA plus one. I don't know howly this is considered like I I need to >> I need to consider piano arithmetic or like whatever proof system I use plus the assumption that this that proof system is consistent which gives rise to a new proof system which can then prove that your if condition is not going to trigger >> and so yeah so this is this is some general obstacle >> right and and we've got to sort of coordinate on what proof systems we use then because if like if If I accidentally use a too strong proof system, then you have difficulty proving things about me. And I I guess it's also like this thing about well, if I defected, would you still cooperate with me? >> Um, it feels a little bit hard to like like in the proofbased setting, I can I can say like, you know, if my program on your program outputed defects, would your program on my program output cooperate? Like like I could just like do that conditional or whatever. um if I want to do this in like a simulationbased setting, which I think there are reasons to want to do, right? Like sometimes you just can't prove things about other people and you have to just simulate them and >> and it's kind of nice because it's like moving a bit beyond like strict computer programs. It's also nice because like maybe it's hard to prove things about neural networks, which is one of the motivations, right? >> But like uh I don't even know what I'm supposed to what the condition is supposed to be in that setting. Uh like may maybe if we're like stocastic programs I could say like okay like condition like maybe I could do like a conditional on like this stocastic program outputs defect but it's it's not even clear that that's like the right thing because like like you're looking at my program you're not looking at the output of my program. Yeah. though you can I mean you can have programs that do things like if the opponent cooperates with probability at least such and such I think that is like one can make those kinds of things well defined at least >> yeah but but but like if if my program is like like somehow what I want to say is if you cooperate with I don't know with high probability against a version of me that defects you know what I mean like like either like either you're simulating just a different program or you're simulating me and I don't know how to specify you're simulating a version of me that defects. >> You know what I mean? >> Yeah. Yeah. I agree that that's Yeah. I in in some special cases maybe I could like run you and if like if I know what location in memory you're storing like the output of me I can like intervene on that location of memory. But it's a this is very hacky and b I'm not convinced that this is even the right way to do it. Um >> yeah. Yeah. It's I think yeah there there's I guess there are like sort of various settings where you could I don't know you constrain the way that programs access each other that that would allow more of these counterfactuals right like um like you could for example you could consider like kind of like pure simulation games where you don't get access to the other player source code you but you can run the other players source code >> um and I guess in those cases some of these counterfactuals become a bit more like more straightforwardly well defined, right? That you can just like what if like when I just replace every instance of your calls to me with some action. >> Yeah. Yeah. >> There's some Yeah. I mean there are some papers that consider these like this like more kind of like pure simulation uh based based setting as well. Um but yeah it's yeah obviously that would not allow for like proof based stuff and things like that. >> So okay I think at this point I want to tee up your next paper. So um in particular in this paper there are two things there are two types of strategies that you like can't turn into this like uh you know program into the program equilibrium setting right so I think we already discussed win stay lose switch where I have to like look at what you did in the last round and I also have to look at what I did in the last round there's also this um strategy in the iterated prisoners dilemma called grim trigger where like if you've ever defected in the past then like I'll start defecting against you and if you've always cooperated then I'll cooperate Um and neither of these you can have in your like epsilon grounded fair. Why is that? >> Yeah. Basically the the main constraint on these epsilon grounded fair bots or piots or whatever um is that they just can't run that many simulations, right? The >> you you can run kind of like >> I don't know one simulation with high probability or something like that. Maybe you can do with low probability you can maybe start two simulations or something like that. But the problem is just like as soon as you and as soon as you simulate the opponent and yourself or like multiple things like and and like with high probability you run into these infinite loop issues again that this epsilon uh condition avoids. Like another um like another case is like if you have more than two players things become weird right like if you let's say you have three players intuitively you would want to simulate both opponents and then if they both cooperate you cooperate if like if one of them defects then maybe you want to just play this special punishment action against them like depending on what the game is but you can't uh you can't simulate both opponents right you >> yeah because The yeah, if you every time you're called start two new simulations or even like two minus epsilon or something like that in expectation, you yeah, you get this tree of of simulations that sort of just expands and occasionally some kind of simulation path dies off, but the it kind of multiplies faster than uh than than its uh Yeah. than than it Yeah. than than um simulations hold, >> right? Yeah. Like like basically you're like you're doubling like when you grow you're doubling and you know that but you like cut off factor of epsilon but epsilon is smaller than a half and therefore you grow more than you shrink and it's really bad. If epsilon is greater than a half then you're not really like simulating much are you? You know yeah so how do we fix it? Okay. So, we have this uh yeah this this newer paper um where where I'm fortunate to be the the second authors and first authors Emry Cooper and then Vince Vince Connor my PhD advisor is also in the paper. Um and yeah, so this fixes exactly these issues. >> Um and it's a yeah I think it's a it's a clever kind of interesting idea. So um so for this to to like explain this idea we need to imagine that um the way that programs um randomize works like a particular way like in a sort of um like the I don't know the the the architecture of the programming language has to be like a particular way to to explain this like >> like I guess you I don't know if you like have like a normal programming language like I don't know you call like random.random random or like some some some such function and you get a random number out of it. But another way to model randomization is that you imagine that sort of like at the beginning of time or like when when your program is first called um it gets as input uh an infinite string of of random v like random variables that are like rolled out like once in the beginning and then you have like this long string of >> of like it could be for example bits and you just all you're going to do is like use the bits from this input. Yeah. And so in some sense like this is sort of a way of modeling kind of randomization with a deterministic program. Like in some sense randomization is like running a deterministic program. Well on an input that is uh random, right? You have part of as part of your input you get this random string. And so let's specifically let's imagine that you you get these as a random string as input but the but each each uh entry is just a random number between zero and one. >> Yeah. Um, the way that these like infinite simulation issues are fixed is that when you like when I run for example like my two opponents and myself, >> I pass them all the same random input string and that way I sort of like coordinate um how they hold or like uh at what point they hold. So um so like very specifically here's here's how it works. So you >> um let's maybe first consider a version where you uh you just the issue is just that you have multiple opponents and but you're still doing this something like epsilon grounded uh fairbot where you you're happy to look just as at the last round um and maybe maybe win uh sorry what is it win stay lose shift um where you maybe also look at your opponent's uh at your own previous action. So what you do is you you look at your random input string and if the first number is below epsilon >> then you just immediately halt in the way as as usual by like just outputting something. >> Yeah. >> And otherwise you uh remove the first thing from this infinite input uh like random input string. >> Yeah. Uh and then you call all of these simulations like you simulate like both opponents let's say like you have two opponents and yourself with the just with the first entry in that in that list removed. >> Yeah. >> And now the like okay how does how does this help? Well, I mean the the opponents might do the same, right? Like let's say they also all they they all also all check like the first thing check whether it's smaller than epsilon and then remove the first and call recursively. Um well the trick is that by all of them having the same input string they all hold at the same point >> right all your simulations are going to hold once they sort of reach the the specific uh item in this input string like the first item in this in input string that is smaller than epsilon. >> Yeah. And so that allows for yeah simulating multiple opponents. You can also yeah you can simulate yourself of course. Um and you can also simulate like multiple past time steps um by instead of passing them like instead of passing them just the input string with with uh the first thing you removed, you can also check like what did they do like in some like in some intuitive sense like kind of two time steps ago by removing the first two random variables from your input string and passing that into them. Uh yeah, so this is the this is the basic scheme for um for making sure that these simulations all halt despite having a bunch of them. My understanding is that you have like two constructions in particular. There's this like correlated one and this uncorrelated one. Um can you give me can you give us a feel for like what the difference is between those? >> Yeah. Um so there are differences in the setting. So the correlated one is one where >> you get uh you get a correlated or like you get a shared random input sequence, right? Um, so I don't know like you you could imagine that like we all get Yeah, we all get there's some central party that generates some some sequence of random numbers and it just gives this sequence of random numbers to all the players and so they they have the same random this same random sequence and then maybe additionally they have additionally they have a private one as well. Um but but they have this shared random sequence and then in this shared settings basically all the results are much nicer. It's like we get basically we get nice results in the in the shared randomness setting and most like mostly like more complicated weird results or like also I mean some cases we also just can't characterize what's what's going on uh in the non-correlated case. Um but in the so in the correlated case we specifically propose to use the correlated randomness to to do these recursive calls. So, so when so I use the like when I call like my three opponents and two opponents and myself on the last round, I take the shared sequence of random numbers. I remove the first um and call the opponents with with with that with the remaining one rather than using the private one. Um and then in the in the case where there's no shared randomness, we just we just use the private randomness instead. >> Um so in some sense it's sort of like it's almost the same program. Um yeah, the ma the I mean there's there's some subtleties but but it's like in some sense sort of the same program and the main difference is that well you feed them this uh randomness. That's >> you're giving the other person your private randomness, right? Yes. Yeah. Yeah. I'm giving Yeah. I mean I don't have access to their uh randomness. I have to give them my randomness. Um which also like I mean maybe it's not that hard to see that this is like you get like somewhat chaotic uh outputs, right? Like my like in some sense my prediction of what the opponent will do is sort of like in some sense it's like quite different from what they're actually going to do. Um because uh yeah because they might have like very different uh different input, >> right? which is an interesting um like in some ways it's an interesting like it's maybe more realistic that like I'm like I get to sample from the distribution of what you do but I don't get to know exactly what you will actually do right um actually maybe this is just me restating that I believe in private randomness more than I believe in public randomness um >> so here's one thing that I found like okay here's a thing that I believe about this scheme that stri me is kind of sad. It seems like so so basically you're going to you're going to use this scheme to come up with things like these epsilon grounded fairbots and you know they're going to like cooperate with each other but like reading the paper it seemed like what kind of had to happen is that all the the agents involved had to kind of use the same sort of timestep scheme, right? like like at least in the construction it's like oh yeah you know they're they're both like everyone has this like shared sequence of public um randomness and they're both waiting until like the random number is less than epsilon and that you know at that point they like terminate and I'm wondering like strong degrees of so so I guess I'm seeing this as like okay in the real world we do have public sources of randomness but there are a lot of them it's not obvious which ones they use it's not obvious like uh how to turn them into like Is it less than epsilon or you know so like it seems really sad if the if the good properties of this have to come from um coordination on the the scheme of yeah we're going to do the time steps and we're going to do it like this. Um but but but I guess I'm not sure like like how much coordination is really required for this to work out well. Yeah, that is a good question. I mean, yeah, I do think that this is sort of like a a kind of price that one pays relative to the the original epsilon grounded pie. Yeah. Which obviously don't have these issues. >> Yeah. >> I think it's like it's a little bit I don't know it's a little bit complicated like how exactly like how robust this is exactly. So, so the so the results that we have like we have this results um this folk theorem about what what equilibria can be achieved in the shared randomness uh case um uh by by these kinds of programs and it's sort of like the same as for repeated games also the same as for for like these syntactic comparison based ones. So like everything that's better for everyone than their Yeah. minimax payoff like the payoff that they got if everyone else punished them. >> Yeah. >> Um like some of these characterizations or like the like I okay I mean I guess the I mean the fact that it's equilibrium obviously means that it's sort of robust to all kinds of deviations but like the like getting the equilibrium payoff uh that requires coordination on on these random uh random things. We also like another thing is that like I I don't know maybe this is always already been kind of implicit in the or like kind of explicit I guess in the in the language I've used with these time steps like there's a close relation between this and repeated games like >> now it's really just sort of like full repeated game strategies and this whole like relation to repeated games hinges on everyone using basically exactly the same time step scheme like um like basically if like if everyone like uses the same epsilon and they do the cut off like if if the same source of randomness is like below this epsilon then it's kind of it's in some sense it's all exactly like playing a repeated game with a probability of epsilon of terminating at each point um and it's sort of like there's a very nice correspondence. So some of the results do really like fully hinge on like yeah really like exact coordination on a lot of these things. >> Yeah. But there's also like there's some kind of there's some robustness still like it's um so for example they still the programs still halt if someone chooses like a slightly different epsilon. >> Yeah. If someone chooses a different epsilon like the the like it I don't know the relationship to repeated game sort of goes away right there's like no this I don't know it's hard to think of like a version to play a repeated game where everyone has like their separate like cut off probability I don't know maybe can somehow make sense of this but it's like the it does become like kind of kind of different from that but um the like let's say I choose an epsilon that's like slightly lower well we're still going to halt if um like at the point where we find an input where everyone's like we sorry like we find a point in this random sequence where it's below everyone's epsilon, right? So like some people like choosing slightly different epsilons is like it makes like it becomes harder for us to say what's going on, right? We can't view it as an a repeated game anymore. >> Yeah. >> But uh but it still works. It's not like everything immediately uh breaks in terms of like everything not halting or something like that. >> Yeah. Or or even if you're using like if like if I'm using one public random sequence and you're using another like like even if it's like uncorrelated, it seems like as long as I eventually hold and you eventually hold, it's not going to be too bad. >> Yeah. In in particular, we we're going to like we're going to hold at the point where like both of our sequences have like the halting the halting signal, right? >> Yeah. And it also seems like um I guess like it sort of depends a little bit on what our policies are, but it seems like as long as I'm not like too picky like like as long as I'm not like super specific about like what exact sequence of cooperates and effects I'm I'm sensitive to like like like surely it'll like maybe it'll just be fine even if we're like not super tightly coordinated. >> Yeah. like in I mean I guess here again it's sort of like this like trying to kind of import our intuitions from repeated games that I guess they yeah there's like a game theoretic literature about that we maybe also have like experience um uh for from like daily daily life um yeah like we like in I guess if you like in practice play a repeated game you're not going to I don't know you're not going to play an equilibrium you're going to play something where like you do do something that's like sort of trying to go for some compromise. Maybe the other player goes for some other compromise and then you kind of try to punish them a little bit or something like that, right? Like you and and I would imagine that there's a lot of this going on in this setting as well. Yeah. Yeah. So So okay, I I think I'm maybe a little bit less concerned about the uh about the the the degree of coordination required. I think like so so there are two thing there are two other things about this paper that seem pretty interesting. So the first is just like yeah the like what the limitations on the equilibria you can reach are and my understanding is that like you can characterize them decently in the correlated case um but it's pretty hard to characterize them in the uncorrelated case or yeah can can you can you explain to me and my listeners just what's what's going on here? Yeah. So in the correlated case it really is quite simple. I mean, I don't know there like as always there like some subtleties, right? Like you need to you need to specify, for example, what what exactly are you going to do if you simulate some other player and they use their private signal of randomness, right, which they're not supposed to do in some sense. Well, you need to somehow punish them and the people above you need to figure out that this is what's going on. So, there's there like some of these sorts of subtleties, but I think basically there is just like a very close relationship between um between these programs and the repeated game case. So you >> yeah it is just basically like playing the repeated case and like even like even deviation strategies you can view as playing the repeated game by saying like well if they get like this random string as input that has like 10 10 variables left until they get to the below epsilon case then you you kind of can view this as them say playing a particular strategy at time step 10. >> Yeah. Um, hang on. The the the the what do they do if they access randomness? So, my my recolle my recollection, which might be wrong, was that you punish people for accessing like other people's private randomness, but they I I thought they could still access their private randomness. You I think like you do have to I think punish people for using their private randomness. >> Okay. Um, and then the like the complicated thing is sort of that like I might simulate you and you might simulate a third party and the third party uses their private randomness and now you as a result punish them and then I now need to like figure out that you you're just punishing them because they use that private randomness >> um >> and you're not punishing me. >> I I don't know like >> yeah so that that condition seems kind of hard to coordinate on, right? Because like naively you might have think well it's my private randomness it's my it's my choice you know. >> Oh the the condition to punish private randomness. >> Yeah. >> Yeah. Yeah. This Yeah. I think this is a reasonable point. Yeah. Yeah. Maybe one should think about ways to make this more robust uh to this. I guess it's sort of like Yeah. I mean one has to think about like what exactly the game is and like how much harm that the the private randomness can do, right? Like if >> Yeah. in in some cases it doesn't really help you to do your own private randomness and then maybe it doesn't I don't need to punish you for it but like if you have like I don't know if you can I don't know there's like some like 20 resources and you can steal them and you're going to randomize which one you steal from and the only way for us to defend against this is by like catching you at the specific resource or something like that then maybe we do just need to think like okay as soon as there's like some randomness going on It's like a little bit fishy. >> Yeah. >> Um but yeah, like you could imagine Yeah, you could imagine games where you want to >> you want to allow some people to randomize privately uh or like you like use that private randomness for like I don't know choosing their password. Maybe this maybe this is sort of like a fun example like yeah >> if you like at time step three you need to choose a password and like in principle >> the so the way our scheme would address this is that we all get to see your password or like you we like in some sense we get to predict how you use your password. >> Yeah. >> I mean it's also still important to keep in mind that like the these past time steps are things that don't actually happen. So like we I we predict what you would have chosen at time step three if time step three was like the the real time step >> but um >> but nonetheless uh if you um like the the the way our scheme would like you might think okay if you choose if you have to choose your password with a public randomness then we all know your password and doesn't this mean that we all would want to log into your computer and like steal your stuff? >> Yeah. Um, and the way this the scheme would address this I guess is just that well everyone like someone could do that but they would then be punished for this >> or or maybe they do do it and it's just like well that's the equilibrium we picked. Sorry. >> Right. It could right it could also be Yeah, it could also be be part of the equilibrium. Yeah, that's Yeah, that's that's also true. Yeah. >> Yeah. So so Okay. Okay. So, so in the in the correlated case, it's like basically you have like a folk theorem and there's something about uh you know just like things that are things that you can punish people for uh deviating from. That's like basically the equilibria you can reach. >> Yeah. Roughly. >> Okay. And then I got to the bit of your paper that is about the equilibria you could reach in the uncorrelated game. And I I'm going to be honest. So, so earlier we had a we had a recording where we were going to talk about these papers, but actually I got like really bad sleep like the the night before I was supposed to read the papers and so I like I I didn't really understand this characterizing simulation based programming equilibria paper. It was like beyond me. You know, this time, you know, I had a good night's sleep. I was rested. I was prepared and I read this paper and then like once I get to the like limitations on the equilibria of the um of the uncorrelated one, that's where I gave up. Like I >> yeah, >> the theorems did not make like I understood each of the each of the symbols but I I didn't get what was going on. >> Yeah. >> Is is there is there a brief summary of what's going on or is it just like well we had to do some math and that turns out to be the condition that you end up needing. >> Yeah. At least for the for the purpose of like a very audio focused uh format. I think it's probably one can't go that much into the details of this. I think I think one can of explain a little bit why one doesn't get a full folk theorem in the uncorrelated case. >> Um like I think there are some relatively kind of intuitively accessible reasons for that. >> Yeah. >> Um okay let's start there. >> Okay let's start there. So so the problem in the uncorrelated case is basically that like let's I don't know let's take like a three player case. Yep. And like we are two players and there's like a third player Alice and now there's the question and like we want to implement some equilibrium, right? And now there's a question can Alice profitably deviate from this equilibrium. >> Yeah. Um and now the issue is Alice will use Alice can use her like private randomization in some ways. And now um so the problem is basically that we like us catching her deviation is uncorrelated with her actually deviating in the actual situation. And additionally uh whether I uh detect her deviating is uncorrelated with you detecting her deviating. >> Okay. >> And this all makes punishing especially like punishing like low probability deviations very difficult. So for example if like if Alice just like with some small probability that she determines with her private randomness >> um she uh defects in some way. Yeah. >> Right. Then okay in the like in the real world like for her like actual action that will determine her utility there's this small probability that she'll defect and then there's some probability that like our simulations of her which we're running like a bunch of um there's some probability that we'll we'll detect these but because like when when I simulate Alice I simulate her like with a completely different random string than the string that Alice has in the in the real world right so in some sense I can't really I like I I can't really tell whether she's actually going to deviate. And then also like you're going to simulate Alice also with your private randomness. >> Yeah. >> Which means that like whether in your simulation Alice defects is also uncorrelated whether with whether she uh defects in my in my simulation. Wait, so >> wait. F first of all, I thought that even in the correlated case like whether she defects in simulation is different from whether she deviates in reality because we like uh we we like you know cut off the private round you know we get rid of the first few random numbers and then like run on the rest right. >> Yes, that is yeah that is true. I guess the >> the thing where we disagree like that that seems like um important and different. >> Yeah. So that Yeah. Yeah. So maybe Yeah. Maybe that's yeah may also not sure about the other about the other one now like how it but like yeah I think the other Yeah, the other one is like is more straightforward the the like we yeah like we can't um >> like it might be that to punish her deviating we both need to do a particular thing >> uh and we just yeah we we just can't like we'll both so I mean there's there's also so like it's a little bit complicated right because you might think like well we can simulate Alice like for a lot of time steps Right. So we we don't like >> you might think that even if she she defects with like low probability well we're simulating her a bunch in some way. >> So there like some complications here like like yeah that like she needs to deviate in some like relatively clever way to make sure that we can't detect this with high probability. Yeah. Um, so yeah, it it is all a little bit a little bit complicated, but yeah, I think this like we can't like we can't correlate our punishment or like we can't even correlate whether we punish. Uh, and so like if if it if like her defecting requires like if if the only way to get her to not defect is for both of us to at the same time do a particular action. Um, that's sort of that's sort of difficult to to get around. >> Okay, here here's All right. Here's a story I'm coming up based on some mishmash of what you were just saying and what I remember from the paper. >> Mhm. >> We're in a three-player game, right? Therefore, punishing actions, firstly, they might require like a joint action by us. Um, and therefore, like that's that's one reason we need us to like be correlated on like what Alice actually did >> at least in simulation. Another issue is like suppose I do something that's not in the good equilibrium and you see me doing that. You need to know whether I did that because I'm punishing Alice or whether I was the first person to defect. And like in that like if I'm the first person to defect, then you should try punishing me. But if I'm just punishing Alice, then you shouldn't punish me. And so like if we if we in our heads see different versions of Alice uh like if you see me punishing if you see me going away from the equilibrium you need to know whether that's because in my head I saw Alice defecting or if it's because in my head I thought I want I want to defect because I'm evil or whatever. I don't know if that's right. Yeah. I'm not sure whether that is an issue because >> like I when I like when I see you defecting it is like because I like I simulate you with my with my randomness as input. >> Yeah. >> And and then you see with with my randomness as input. >> Yeah. >> Alice defecting like one level down. >> Okay. Which means that I like remember that I'm also like I'm simulating like kind of all of these pi past time steps as kind of determined by my randomness. >> Okay. >> So I think I can see whether Alice whether the reason you defect in my simulation >> is that you saw Alice defect. >> Wait, if we're using the same randomness, then why why isn't the case that we both see Alice defect at the same time with our same randomness? So I mean this is all like my my simulation of you, right? Rather than the the real you. >> Oh. Oh. So the so the real WIS might not coordinate on punishment. >> Yeah. Yeah. So yeah, I mean this is sort of like another thing that's like oh like even I guess even with a like very basic epsilon ground at Pybots, you can kind of like imagine they like they play like in the in their head they're playing this like tit for tat where like it's going back and forth and like one person does this based on their randomness and then the other person sees this and it responds in some particular way. >> Yeah. >> But if you don't have shared randomness, all of this like like this is all like complete fiction, right? actually simuling and then I also defect on Alice and then like we're like happily like defecting on Alice and like we're we're in the simulation we're thinking we're doing so well. We're like getting getting this Alice to to to regret what she does and so on. But the problem is that you run a completely different simulation. So like your in your simulation of of what Alice and I do, you might see everyone cooperating and everyone thinks like, "Oh, it's everything's great. We're all cooperating with each other." And then we, >> you know, we've we've done the simulation and now we're we're playing the actual game and I defect thinking like, "Oh, yeah, we we're on the same team against Alice." And then you you think like, "Oh, nice. We're all cooperating and you cooperate." Um, and then we're like landing in this like completely weird outcome that sort of like >> doesn't really happen in the in the simulation. was like sort of unrelated to what happens in this, >> right? So, so Alice basically says, "Hey, I can get away with like doing nasty stuff because like they won't both be able to tell that I'm doing the nasty stuff and therefore I won't properly be punished um like like in the real world." And so should I and so these gnarly theorems, should I basically read them as like the preconditions are there's some math thing and the math thing basically determines that like this kind of thing can't happen and those are the equilibria you can reach. Is that is that it? >> Yeah. So I think the like I think one thing that drives a lot of these characterizations is sort of this like yeah Alice can like uh defect with like low probability. I think usually that's sort of like the the more problematic case is that she >> defects like in a particular kind of clever way with low probability. Um which means that >> like we're very unlikely to both detect it as at once. Yeah. Um I think that is driving driving these results a lot. Um but yeah there is like to yeah to some extent I mean you said this earlier like well there's some math going on like I think to some extent that's true. Um there is so I think one thing that I liked about these results despite like like yeah I mean is of of course one is always one prefers results that are like very kind of clean and simple like the f theorem where you just have this very simple condition for what things are equilibria and like yeah we our our characterizations are mostly yeah these are kind of like complicated formulas. I think one thing I I like is that for some of these characterizations, one can still kind of hold on to this interpretation of there being time steps and like you simulate what people do at previous time steps. Yeah. And things like that which sort of um like it's sort of very intuitive that this works for >> Yeah. the case where every everyone kind of plays nicely with each other and everything is correlated and like in some sense we're playing this mental repeated game where we all use the same randomness and so we're all playing the same repeated game and really it's just the thing that is sampled is like which round is the real round. Um >> it's sort of clear that there the time step story works. Um, and it's sort of it's it's nice there that like I don't know there are like some results where you can still like use this this time step um uh this yeah this time step picture. Um so so that's like one kind of one kind of nice thing about about the results but yeah it is it is unfortunately much more complicated. >> Fair enough. Um and then so another part of the paper that uh is kind of cool and that you foregrounded earlier is it has this uh well it has this definition of like simulationist programs. Um and so earlier you mentioned there was a definition of like fair programs or something which I guess like maybe um you referring to this definition. Yes. Um yeah we do have so like >> like in some sense the paper has like three parts right? It's the the one with the the correlated case with the epsilon grounded like these generalized epsilon grounded piots that pass on the shared random sequence. Um then the uncorrelated case with the epsilon grounded uh fair bots. And then we also have a section that kind of analyzes yeah more general simulationist programs >> which are programs that just intuitively >> like intuitively all they do is like run the opponent with themselves and like the other players as input. >> Yeah. >> Um yeah that has this definition and then we for those we have like a characterization as well like like for example one one result that we also show is that um like in general like general simulationist programs are more powerful at achieving equilibria um in the uncorrelated case than the epsilon grounded pybots >> um >> like I yeah I'm not quite sure how much to go into detail there but like one I mean one intuition that you can have is like >> the in the epsilon grounded pybots like to some extent sort of like everyone has to do the same thing whereas you could have settings where I don't know only I need to do simulations and then if only I simulate your program, I can like run like 10,000 simulations or something like that. And like this is something that >> obviously the the FOR grounded Pybots can't do like you can't just independently sample a thousand responses from the other player. >> Yeah. >> Uh yeah. Yeah. And this Yeah, we we do there have this uh yeah this definition of simulation as programs. Uh I don't I'm not sure I remember the the details off the top of my head, but >> it's some recursive thing of like you've got a it's a simulation program is like it it calls its opponent on a simulationist program uh which maybe includes itself and maybe I I forget whether it explicitly has like epsilon grounded pyots as a base case or something. Um >> yeah, >> maybe maybe simulating nobody is the base case. >> Yeah. >> Or just ignoring the other person's input. >> Yeah, I think so. Yeah, I think that's that's also coming back to me. Yeah, I think it's like some something like that. But I think the so the the tricky part is that you can like maybe intuitively you might think that a simulationist program is just one that like calls the other program with some other program as input but then if you don't constrain the programs that you give the other player as input, you can >> you can sort of smuggle the this like non-behaviorism back in by having like I don't know what does my opponent do against against the these like syntactic comparison bots or something like that, >> right? >> Um yeah, there's a good there there's a good appendex like like it it's like yeah for why we do it this way. I see this appendix and you read the appendix and it's like ah that's like pretty comprehensible. It's not one of these cases where the appendix is all the horrible like >> yeah glad glad to hear that the that you like the appendix some of the extendics some of the appendix is also just very very technical like working out the details of >> I skipped those appendices but but there are some there are some good appendices in this one. >> Nice. All right, the next thing I want to ask is what's uh what's next in programming equilibrium? What do we like I don't know what what else do we need to know? what like what what what should enterprising uh listeners like try and work on or is there any work that's so earlier I asked you like um about what was the state-of-the-art like before you published robust program equilibrium like is there any like uh work coming out at around the same time which is like also worth kind of talking about and like knowing a bit about the results of so I think yeah there there are a bunch of different directions so um so I do I think that we still leave open like various sort of technical questions and there also like some kind of technical questions that are still open for the um for these Lian uh programs that it would be natural to answer. So yeah, so one one thing for example is that um like I would imagine that okay maybe sticking closely to our paper first um there there are some like very concrete open questions like even listed in the paper like I think um like I think it's not clear whether >> okay I'm not I'm not I'm not entirely sure but I think like in the two player uh simulationist program case it's not clear whether for example all parto optimal uh better than minimax utility profiles can be achieved in simulation as program equilibria. >> Okay. >> Um so there like don't like maybe this is not quite the quite the right question but like you can you can check the can check the paper. I mean there's some there's some also like I mean yeah for some of these like we have some characterizations for these uncorrelated cases but I think for um I think for the the general simulationist case we don't have a full characterization. So if you want to be yeah if you want to be if you want to kind of go further down this path of this paper there there are a bunch of um there are a bunch of directions there that that are still after like kind of like yeah little I don't know like somewhat some somewhat small holes to to to fill in. Um then I think another another very natural thing is that for like I think for the Lubian bots there isn't a result showing that you can get the full folk theorem if you have access to shared randomness which I I'm pretty sure is the case probably even with like I I think probably with um with some some mixing of like this epsilon grounded stuff and the Lubian uh proofbased stuff I would imagine you can basically a full full folk theorem but there's no there's no paper proving that I don't know maybe one day I'll I'll do this myself um but but this I think that's another like very natural question to ask um I think so in my mind another like kind of like going a bit further out outside of what we've uh what we've discussed so far I think I think in practice Um, in practice, I would imagine that usually one doesn't see the opponent's full source code. >> Um, and maybe it's also even undesirable to see the whole source code for various reasons like you you don't want to like release all your secrets. Um, maybe also like these I mean we talked about these folk theorems, right? like way like everything that is better than this like punishment outcome um can be achieved and this like I think game theorists often view this as sort of like a positive results whereas I have like very mixed feelings about this because >> it's it's kind of like a well anything can happen and in particular like a lot of like really bad outcomes can happen right like like outcomes that are >> better than the the best thing that I can achieve if everyone just punishes me maximally right like Well, it's not very good, right? Like there there are lots of lots of very bad things that people can do to me, right? And like so there there are lots of equilibria where where I uh I get very low uh low utility, right? >> Yeah. And in particular, you've got to like like if there are tons of these equilibria, it's like the the more equilibria there are, the less chance there is we coordinate on one, right? >> Yeah. So I I guess one maybe one positive thing is that like I mean you have this sort of like I don't know like in the correlated case you have this this like convex space of equilibria. So at least it's sort of like well you need to find yourself in this convex space rather than finding yourself between like six discrete points and so maybe maybe that makes things easier but yeah I think basically I agree with this. I think on on our last episode we uh like the my this is my first second appearance on ax right on our first on the first episode there was on ax um we discussed this equilibrium selection problem which yeah I think is very very important and motivates a bunch of my my work um yeah so uh so maybe if you have if you have less information about the other player then you get fewer equilibria um like in maybe in the extreme case you maybe they're like if you get only like very little information about the player maybe you only get one additional equilibrium relative to the equilibria of the underlying game. >> Yeah. >> Uh and I think we even like we discussed this similarity based cooperation paper also on the previous episode. Uh and that is that is basically such a setting. It's like basically a program equilibrium setting where you don't get full like you don't get the full opponent source code but you get some signal of like in particular how similar the opponent is to you and and there there are some results about how how you get only good equilibria this way. Um so I think that's I think in general that's sort of like a natural direction to go in also like you can also do like more practical things there right like the similarity based cooperation paper has some experiments experiments if you like you you can do experiments with language models um where like in some sense you also like this is sort of true right like if I like if my program is I prompt a particular language model right and then you know my prompt but you don't know all the weights of my language model or maybe you can't can't do very match with all the weights of my language model. Yeah. >> Um like that is that is a sort of partial information program equilibrium. >> Uh so so I think that is another natural direction. And then also I think like there like we like I think you drew this these connections to like uh decision theory, right? which is sort of like if you like in some sense like if you are the program and you have to reason about how you're being simulated and people are looking at your code and stuff like that, how like how should you act in some kind of rational >> like rational choice type type sense. >> Yeah. Like this is that's sort of like the yeah the problem of decision theory and in like in some ways you could view this program equilibrium setting as sort of addressing these issues by taking the sort of outside perspective right like instead of asking myself what should I as a program who's being predicted and simulated and so on what should I do instead of that I ask myself >> I'm this >> human player who's outside the game and who can submit write code what is the best code to submit and like some sense that makes the question less philosophical. Um, and I think like yeah, I'm I mean I'm very interested in this these more philosophical issues and I think the like I don't know I feel like the connections here aren't like fully settled like what uh like how exactly do yeah what exactly does this princip this like principal perspective or like this outside perspective correspond to from the person of the agent? like you said the the this equilibrium where everyone checks that they're equal to the other player like that's an equilibrium where the the players thems like so the programs themselves aren't rational they're doing don't do expected utility maximization they just do what their source code says right so I think I think this like this I mean this is much more philosophical much more open-ended than these more technical question about like what equilibria can you achieve but yeah I'm I'm still very interested in those things as well >> so the final question I want to ask is uh you know people if people are interested in this work um in particular in your work. How should they find more? >> Okay, so I have I just have an academic website. I guess fortunately my name is relatively rare. So if you if you Google my name, you'll find my my academic website. Yeah, >> you can also check my Google Scholar which I guess has sort of a complete complete list of uh of my work. Um I also have a blog um where I occasionally post things like somewhat related to these things like it's Yeah. Yeah. relate to these kinds of issues um uh which is just casperosthealth.com. Yep. >> Which in principle should allow uh subscribing to email notifications. Um, and I also have uh an account on X, formerly Twitter, uh, which is C_Ostra Health. Uh, yeah. Uh, I think that's those are probably all the all the things. >> Great. Cool. So, there'll be links to that in the transcript. Uh, Casper, thanks very much for coming on the podcast. >> Thanks so much for having me. >> This episode is edited by Kate Brunaut and Amber Dornace helped with transcription. The opening and closing themes are by Jack Garrett. This episode was recorded at Far Labs. Financial support for the episode was provided by the long-term future fund along with patrons such as Alexi Malafv. You can become a patron yourself at patreon.com/exrpodcast or give a one-off donation at kofi.com/exrpodcast. That's kofi.com/xrmpodcast. Finally, if you have any feedback about the podcast, you can fill out a super short survey at axp.fyi. [Music]

Related conversations

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

AXRP

11 Apr 2024

AI Control with Buck Shlegeris and Ryan Greenblatt

This conversation examines technical alignment through AI Control with Buck Shlegeris and Ryan Greenblatt, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -9 · 174 segs

Future of Life Institute Podcast

7 Jan 2026

How to Avoid Two AI Catastrophes: Domination and Chaos (with Nora Ammann)

This conversation examines core safety through How to Avoid Two AI Catastrophes: Domination and Chaos (with Nora Ammann), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -3 · 85 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.