Finding videogame’s true voice

Some links in this article have expired and have been removed.

The main gist of this post is that we are not using the full narrative capability of video games. I believe we fail to take into account certain aspects that lie at the core of making artistic creations powerful and thus miss out on crucial strengths of the video game medium. To get to the core of these strengths, I will first have a look at other media (specifically film and literature), and then explore what lessons that can be applied to video games. What I end up with is a way of thinking that use basic elements of the film and literature experience, yet is quite different from these.

It is very easy to look at other form of media, see what they do well, and then try and copy this. I think this is a big problem for video games. Whenever a game focusing on a narrative-oriented experience is made, it is instantly compared to other media and judged according to their strengths. For instance it is very common praise to call video games cinematic, or to concentrate critique on their plot structure. Obviously, I do not think this is the right approach. Instead I think we need to take a step back, and consider what it is really in these other media that makes them work. We must then explore in what ways these concepts can (and if they can!) be applied to video games.

My suggestion for this “magical essence”, which I will outline in this article, are empty spaces. The bits that require the audience’s participation and imagination. Basically, the part of art that require us to be human.

The power of imagination

First of all let us take a look at literature. For this “The fall of the house of Usher” by Edgar Allen Poe will be used as an example:

“I know not how it was—but, with the first glimpse of the building, a sense of insufferable gloom pervaded my spirit. I say insufferable; for the feeling was unrelieved by any of that half-pleasurable, because poetic, sentiment, with which the mind usually receives even the sternest natural images of the desolate or terrible. I looked upon the scene before me—upon the mere house, and the simple landscape features of the domain—upon the bleak walls—upon the vacant eye-like windows—upon a few rank sedges—and upon a few white trunks of decayed trees—with an utter depression of soul which I can compare to no earthly sensation more properly than to the after-dream of the reveller upon opium—the bitter lapse into everyday life—the hideous dropping off of the veil. There was an iciness, a sinking, a sickening of the heart—an unredeemed dreariness of thought which no goading of the imagination could torture into aught of the sublime. “

This is an excerpt of a quite lengthy passage where the narrator describes the House of Usher as he approaches it. Even though it says a lot, it gives us very scarce information of how the house actually looked like. The focus is instead on the feelings and actions of the protagonist. The text tells us the response that the imagery evokes in the narrator and based on that it urges us to make up our own mental image of the scene. This is typical for literature. Descriptions are usually sparse and instead emotions and events are used as to paint a scene for the reader. A lot of responsibility is shouldered on the audience, certain knowledge is assumed and this (which I think is especially important to highlight) without the author losing any artistic control.

Next, let us consider movies. Normally one would think of movies as being very exact in their portrayal of a story, almost like a window to an alternate reality. However upon a bit of analysis it is clear that this is not the case. Film requires us to make non-trivial connections between sequences and invites us to read the minds of the actors. The Kuleshov Effect makes a clear case for this. Just watch the following video yourself and consider how your interpretation of the face changes depending on the context in which it is shown:

As we see a character on screen, we are meant to start imagining what that person might be feeling. Whenever a cut is made, it forces us to make up a casual relationship between the juxtaposed events. This can easily become quite complex as this short clip from the famous Odessa stairs sequence show:

Somehow we are able to make sense of this cacophony of imagery, constantly making connections between clips, weaving our own coherent narrative inside our minds. Just as books require readers to fill in the sensory details of a scene, a film forces the viewer to imagine the emotions and casual relationships portrayed. Both literature and film heavily depend upon the audience’s imaginative interpretation and will lose its impact without it. I would even say that the greater this gap of imagination is, the more room for interpretation, the more powerful and artful the work becomes. By this I do not mean that the more obscure art is, the better it becomes. Rather, the ability to leave plenty of gaps for the audience to fill, without making the work incomprehensible and meaningless, is what makes great artists and great works of art.

Filling a gap

Even though this audience participation required in books and movies might not be obvious at first, it does not feel that strange once you realize it. It is quite easy to see that we make up worlds in our head when reading or that we construct a fluent narrative from edited imagery when watching movies. But viewed from the perspective of somebody who encounters this for the first time, I would say that is far from evident. It is really quite weird that we can count on the audience to build up whole scenes in their heads. This based on almost purely emotionally descriptive content. Dialog in literature is a great example of this, where the spoken words are alone at conveying the look, actions and sometimes even emotions of the characters involved. There are tons of background knowledge required to makes sense of this, and it would be extremely hard to teach computers the same tricks.

I bring this up mainly because I want to show that, even though all of this is now part of our everyday life, it is far from self-evident truths. For instance, film editing took a while before it was properly figured out, and its complex usages even longer. This should hint us that whatever there is left to figure out about the videogame media, we should not expect it to be self-evident or even seem like it would work when first encountered.

Another important reason for bringing this up is to show that all of these gap-filling has a retroactive aspect. For instance when connecting clips in a film, the whole meaning (ie the action that the clips portray) come together afterwards. Yet to us it seems like a continuous experience and in a way we actually inject false memories of an imagined event. This is basically how animation work, where we first see an object in one position, then in another, not until both event are experienced making our brain interpret the entirety as if motion occurred. However, we never experience it like that; we simple see it as a motion of an object from one point to another and do not notice the mental effort required.

This is even more evident in literature where descriptions of objects can come far after they were first introduced. Even though this may seem like a jarring discrepancy, it pose no problem to us and we can meld these new facts with the earlier event portrayed. For example, if we remember a tale the early happenings exist in our mental images with the detailed characters shaped during the full read-through. They are no longer the unknown entities they were when we read the passage for the first tine.

What this tells us is that we should not be afraid of giving the audience incomplete information or experiences. Not only does this “removal of facts” not pose a problem, but it actually seem essential in creating a powerful experience. It is actually as if something “magical” happens when we are forced to complete the work ourselves.

Side note:

Split-brain persons show a very extreme example of our human urge to, often unconsciously, fill these sort of gaps. For example, outlined here are some experiments where the subject effortlessly made up details from incomplete information without conscious knowledge about it. I think it clearly shows how the brain is hard-wired for this kind of behavior and that it is essential to what makes us human. This visual illusion found here also show how eager we are to create casual relationships, and how the context makes us change how these are made.

In search of the void

It is now time to take a deeper look into games and to search for an equivalent of the “gap filling” concepts found in literature and films. Instead of meeting this head on, I think it is important to discuss what it is that is especially distinct and descriptive (and thus not requiring the audience’s interpretation) in games. I would say these things are:

  • Details of the world. Not only are games extremely clear on what a scene looks like, they often allow it to be exploration and makes it possible to very closely examine the various parts of the world. This is something that is especially true for 3D games, where players can view objects from almost any angle they please.

  • The fluidity and coherence of actions. As players are in direct control of the protagonist, there is never any doubt of what events are taking place. Because of the interactive nature of video games, a constant feedback loop of actions and consequences are required, forcing the events taking place to be exact. Video games are all about right here and right now.

Side note: I am aware that the above might not be strictly true for all game types and is more fitting for real-time 3D games. Although this should not disqualify any further conclusions, it might be preferable to think of the following discussion as focus on 3d video games in particular.

The above points mean that if we want to leave room for imagination in games, it cannot be the scene building from literature nor the connecting of events in film. With the level of detail of the world provided, little is left to the imagination. And the fluent events demanded leave very little room for players to fill in their minds. So what other gaps are there to be filled? To find this out, we need to take a look at a core feature of video games: interactivity.

So what exactly does interactivity encompass? I like Chris Crawford’s definition (from this book):


“A cyclic process between two or more active agents in which each agent alternately listen, thinks and speaks.”

What I like about this wording is that it makes it clear that interaction is not all about a user providing input. It is also about considering and then reacting to this input, and that the same applies for both sides (meaning both the human and computer). When it comes to finding opportunities for adding gaps of imagination, these are all of course on the humans side. Also note that the gaps might take place at any of steps: listening, thinking or speaking. With this in mind, I will make an attempt at some gap finding.

So where in this interactive cycle does there exist room for the imagination? The most obvious place is of course the “listening” (meaning any input). Even though we get a clear view of how the world looks like, there are still things left for our us to craft in our minds. This is something that is already present in some games and comes in the form of “environmental storytelling”. Through exploration players can pull information from the world, gather details on past events and imagine emotional states of the world. Bioshock is a good example of this, where much of the attitudes and history of the sunken city can be found out purely by, the interactive process, of exploring the environment.

However, environments are lifeless entities, and while they can portray the aftermath of actions, they do not give us any feeling of agency. This greatly lessens the impact and diversity of imaginative gap-filling players can make. To take this to the next level, it is quite obvious that we need to included simulations of conscious beings. This allows us to construct mental “theories of mind”, something that greatly increases the possibilities of expression. The problem is that we simply cannot do this with current technology, except at a very rudimentary level. While our techniques for facial expression is constantly getting better and better (L.A. Noir is a good example), this is only meant for prerecorded usage. When it comes to real-time procedural generation of expressive characters, we are at an extremely primitive stage. Because of this I believe that this can be very interesting to explore in the future, but not something that can be used right now.

So what else can be done? With expressive characters in real-time not an option, we must turn focus onto the actions themselves instead. As stated above, the events in video games do not leave any room for interpretation. But there still room for the imagination here though. What actions to make and why the are made.

Constrained role-playing

Imagination of the what and why of actions probably sounds a bit strange and needs some explanation. When players take control of an avatar in a video game, they are free to do what they please as long as it is accordance to the rules of the game world. This freedom might seem as the kind of gap that can be used to mimic the “magic” from literature and film. However not in the way actions are normally implemented: very specific and unambiguous. I reason so because there are two major problem with this approach.

The first is a technical one, namely is that it is pretty much impossible to give the player access the space of possible actions for any given situation. This means that there will always be events that the player can think of, but will be unable to carry then out. This limits the ability to role-play and might also leave, according to the player, the most intuitive and plausible action unavailable, breaking up flow and presence. The second problem is that the more events are added to aid role-playing, the harder it is to have artistic control, making the experience into an open-world simulation instead. As both of these problems work against one another, I think we have gotten pretty much as far as we can using this kind of design.

My suggestion for solving this problem is to have a limited number of actions available, but to lure players into imagining that the actual action performed was exactly the one they wanted to do. A very simple example of this can be found in games Samorost and Windowsill where the player can never in advance know what a mouse click will result in, yet when the action occurs it feels very intentional. This imagined motivation does not have to occur on a such low level though, and can include larger segments of the game. An example of this is The Path, where players are thrown into strange environment and forced to make up their own reasons for being there. Often this is something that is built up over a long time, yet greatly shapes how you view your entire session. I am not saying that these games are doing it the right way, only that they incorporate rudimentary versions of the ideas I am talking about, and hence can give one a basic hint of where to start from.

I bet that many will think of this concept as cheating. How can tricking the player be a proper design choice? If the whole interactive experience is an illusion, how can it carry any meaning? I argue that the same is true for other media. The events that you think happen on in film are in fact illusory too. Not only in the way that they merely consist of acting, set pieces and post production effects, but that many of the actions perceived was never filmed at all. They were instead conjured in the mind, by interpreting visual and auditory stimuli. The same is true for literature, were most of the mental images are never found in the text. Despite of this we do not describe the experiences these media give us as meaningless tricks.

Why “motivational imagination” sounds so strange has to do with the nature of interaction. When we watch a movie or read a book, this is passive experience where data only flow as input. But in the cycle of interaction, we are also part of creating output data. So when we create gaps of imagination for this kind of art work, we are unable to see it as a one-way stream of information, but have to include ourselves into it as well. The upside of it all, besides the solving the problem of role-playing, is that it fits neatly into same kind of concept that gaps in literature and film build upon. First of all, it contains a retroactive aspect to it, as players will need to digest a certain amount of data before settling on a certain motivation. It also forces us to make up a theory of mind, not for a fictional character, but for ourselves, inversely figuring out how we could come to a certain conclusion.

With this hypothesis I am not urging people to create games that are extremely linear and only require a single input. I still believe that we can have a wide palette of interaction choices, but that we might not want to be too specific about the exact actions that ought to occur. This is actually very closely related to the concept of player-avatar-symbiosis that have been discussed in an earlier post on this blog. I also do not believe that this takes away anything from the experience, but only adds to it, just like the same line of thinking does to other media.

End notes

This is far from a full theory at this point, and “environmental storytelling” and “imagined motivation” are most likely not the only imaginative gaps that can be used in games. Because of this I would be very interested in getting feedback and to hear your response on this work.

I would also like to point out that all of this awfully untested. It would be really interesting to see some Kuleshov-like experiments on the concept and see what kind of results can be made. It might be the case that this hypothesis does not work at all, or it might that it lead to wonderful and totally unexpected insights.

I also want to add that not all kind of experiences can be created like this. The same goes for literature and movies too, where leaving too much up to the audience simply does not work. Non-fictional books is one thing that comes to mind. Still, that is not a reason to not try this out. Before we try out all options that the video game medium provide, we will have no idea what can be accomplished with it.

Additional Notes: In Scott McCloud’s book “Understanding comics” two similar “imaginary gaps” are explored in the medium of comics. One is the literal gaps between panels, that forces the audience to complete the missing information implied to be between. This is very much like what is found in books and movies, as it forces the reader to use external knowledge and also comes with a retrospective aspect. The second gap is one of cartoon symbolism, where simply drawn characters often can be more expressive than detailed ones. Again this requires quite a bit of interpretation from the audience.
I think this shows that the features discussed in film and book, apply to other media as well, making me more confident that they ought to play a big role in video games too. 

Acknowledgments: This essay has been greatly inspired this post by Michael of Tale of Tales. If not for him I would probably have never started thinking in this direction and none of the above would have been written.