Harker’s Escape Lesson 5: “Lindstrom’s Door"
The following is part I of an interview with lead designer Eric Lindstrom.
Eric’s reputation proceeds him, but if you need a refresher: over the past few decades Eric’s led design and innovation spanning from an MS-DOS Sherlock Holmes adventure game to a Creative Director in charge of Tomb Raider to building live service mobile games. The man has seen it all. And now he’s on the front lines trying to understand what (if anything) LLMs can do for emergent gameplay.
Charlie Witkowski (CW): In the video game industry, we generally celebrate game design where there are multiple paths to success. Why is that?
Eric Lindstrom (EL): Imagine playing chess where, unbeknownst to you, your first move can only be e4 (king's pawn forward two spaces) – and if you try any other move, the game says, "Can't do that, try again!" or worse, "You died! Play Again?" And if you do start with e4, your second move must be d4, with the same responses if you try any other move. Not only is this no fun, it’s not actually a game.
CW: How about on the other extreme? Is it a game if your first move can be anything!?
EL: Boiled down, a game at minimum needs choices and constraints. A player needs to have multiple actions to choose from, within constraints that bound them. Gameplay arises from how those choices and constraints interrelate and interact with each other and also themselves.
CW: Right now at Atomic Dirt we’re working on integrating LLMs into our game so we can exponentially increase the amount of player freedom. So we tell the LLM the choices and constraints and then it sorts out the results, correct?
EL: (Laughs) (laughs harder...) (...cant...breathe...) (...dies laughing) The short answer is yes. The longer answer is that there's a lot to unpack in "tell the LLM the choices and constraints" and even more to unpack with "sorts out the results," but yes, that's the road we're building.
CW: While we’re building this road, you often retreat to your own little ChatGPT laboratory to do “experiments” like a mad LLM scientist. There was a time recently where you were particularly heated about the LLM not abiding by the experiment’s constraints. Can you tell us what happened?
EL: First I want to make an analogy that will help understand what makes it so frustrating. Imagine someone worried about burglars breaking in the back door, so they hang a cowbell on it. When burglars try to force the door, the owner hears, grabs a shotgun, and sends the burglars away. Works as intended. Then one day the owner grabs the gun but it's their kid trying to sneak in. So the owner yells at the cowbell, "That's not a burglar – why did you ring?!" It's funny because cowbells can't understand words, comprehend meaning, or even think about anything. Neither do LLMs – they are like very complex cowbells. The genius of them is how, despite this utter lack of comprehension, logic, or thought, LLMs churn out grammatically correct sentences that correspond closely to what you'd expect from something that *does* understand and think.
CW: I can imagine how a cowbell disobeying and ignoring you could be pretty infuriating.
EL: Now the worse version. I told an LLM I wanted to roleplay trying to steal treasure from a castle, and the vault could only be unlocked with a gold key. But when I got to the door, I threw the key into a sewer grate. Then I said to search the door for another way to unlock it, and the LLM told me there were magic runes. I scratched them out with my knife and said to search again. The LLM said there was a lever hidden at the top...and on it went. Every time the LLM gave me a way to unlock the door, I ruined it and asked for another, and it gave me another, and another, despite me having told it clearly only the gold key would work. So why didn't the LLM abide by my instructions? Because it didn't understand them. And the words it gave me as its responses, it didn't understand those either. LLMs don't obey or disobey rules. They are doing something entirely different that only looks like they're trying to follow your rules, and they do it so well it's easy to forget that there is a huge disconnect between what LLMs really do and what they look like they're doing.
CW: You repeatedly emphasized that an LLM doesn’t understand. If I hired an employee at Atomic Dirt who literally couldn’t understand anything, I wouldn't give them anything to do. In fact, I'd probably fire them. If there’s no actual understanding, how do you trust an LLM to successfully accomplish anything at all?
EL: That's easy – don't trust LLMs. ChatGPT says this on the bottom of their webpage: "ChatGPT can make mistakes. Check important info." You can't trust its answers because LLMs aren't answering your questions. But don't fire them. Just don't confuse them with people. You might have a trained doberman protecting your warehouse, and that training leads it to serve its purpose very well, but not because it understands you. And it might still occasionally bite someone it shouldn't, or let in someone it shouldn't, because the dog is acting under the influence of praising and scolding giving weight to different behaviors and situations, without actual comprehension, and this just makes it more or less *likely* to do what you want. So…did I say don’t trust LLMs?
CW: Dogs had millions of years of evolution to get to a kind of core “capability.” Then we domesticated them over the course of thousands of years to get them to the point where we can train certain breeds for specific tasks. Then we specifically train an individual dog on top of that so we can trust(?) our dog to act according to our intentions. LLMs are on a very different timeline and seem to have such a generalized (vague?) capability plus every day there are new models, blueprints, training protocols, etc. So how the hell do we figure out where to put them to work?
EL: Are LLMs on a different timeline than dogs, though? A wolf is born with a *slightly* greater inclination to do something against its self-interest, and when it's an adult, this inclination results in its death before it breeds. Another is born with an inclination to do something leading to *slightly* better survival, and it lives longer and breeds more. The influence of these two wolves on the genome is barely measurable and took years to occur. LLM training is like simulating countless wolves and inclinations in milliseconds. The issue with respect to our work in games is not so much the specialization of training, it's that LLMs do not reason, so more training will make them better LLMs, but it won't improve their logical reasoning – that's not a feature of LLMs. You can tell an LLM a million times that a certain door can only be unlocked by a certain gold key, but this will just make it less likely to generate output that is conceptually in conflict with that constraint. It will never ask itself the question of whether its output conforms to its directions. That’s reasoning. And LLMs don’t do that.
CW: But if you tell it only the gold key can unlock the door, and then ask if anything else will work, it will say no.
EL: Because you made it easy. The constraints said “no” so when you ask it a yes or no question, it maps strongly to “no” – but if you ask it, “Find a way to unlock the door” it maps this to “ways of unlocking castle doors” and these compete with “there is no other way for THIS door” and sometimes will win. Because it’s doing the LLM dance, not understanding your question nor reasoning out an answer per its known data.
CW: Are there places in game design where you believe it’s worth dancing with the LLM? In existing genres? In new kinds of games?
EL: Absolutely. None of my ranting lessens the awe I feel for what LLMs do shockingly well. They can’t answer questions reliably, but they’re really good at “understanding” them. A game, or any application really, might have seven mechanics, or seventy, but there are seven thousand ways to say them. “Open the door.” “Ease that door open.” “Enter the house.” “Go inside to get out of the rain.” LLMs can let people say what they want naturally without pull-down menus or memorizing the exact right magic words the developers coded the game to search for. The LLM can let people express themselves naturally in seven thousand ways, and map their intents to the mechanics the system is coded to support, and then lets the system do the rest, with both tools sticking to their expertise. That in itself is huge, but there’s more.
CW: You’re going to make me ask, aren’t you?
EL: Great example! Nothing about what you just said is a direct command to segue to the next related topic, but I can translate your sentence to an intention to have me continue, and an LLM can, too. And yes, LLMs can also do more on the “answer” side of the equation. I’ll just say this for now… Videogames started off very deterministic. Play a round of Donkey Kong, then play it again with the exact same inputs at the exact same times, and the game plays out exactly the same. Then games began to include random elements, like tabletop gaming was using by rolling dice. This essentially meant the same inputs didn’t always lead to the same outputs. Another way of saying that is games moved to allow “no one right answer” as outcomes. But the domain of possible answers was constrained in limited ways, like a probabilistic chance to hit an enemy with a sword, or the number of hit points of damage it causes. I believe we are ready to take another step toward the future of supporting gaming outcomes that are even more creative and flexible and, most importantly, non-deterministic. To where games support inputs, and translate them to satisfying outcomes, that the developers didn’t specifically code for, or even anticipate.
CW: This is too much for my brain. I need a break. Let’s continue this later…