Section II

Implementing Reward Schedules in Video Games

Practical Issues of Implementing a Reward System Within a Video Game

When analyzing a game design from an operant conditioning standpoint, it is necessary to determine:

What schedule of reinforcement is being applied?
What are the rewards?
What behaviors are being reinforced?

In some games this process is straightforward. For instance, it is clear that video poker follows a Variable ratio schedule. In other games, unraveling the reward structures is much more difficult. For instance, different players may find different aspects of a game reinforcing. For example, some Tetris players may find reinforcement in fitting blocks perfectly. Another Tetris player may find only the clearing of rows reinforcing. Both players are on different variable ratio schedules.

The matter of implementing reward systems is further complicated by the way secondary reinforcers are learned. The player who enjoys clearing rows will likely learn to enjoy fitting blocks and will now be operating on two ratio schedules. Not only may new reinforcers be learned, but the overriding motivations for playing the game may change. For instance new players may be motivated by the chance to explore the game and test out the features it has to offer. Once they have exhausted their initial curiosity, they may switch to trying to accumulate points. Still later, they may play to beat other player's scores.

A further complication is that it is not only the reward system that may change, but also the type of reinforcement schedule used to change as the game progresses. For instance, a reward may initially be provided on a fixed ratio schedule but as the game advances the schedule may switch to a variable ratio. An analysis of a game designs reward systems needs to look at the game both from the point of view of a new player and from the perspective of an experienced player.

Reinforcement Schedules Operating in Tetris.

Tetris is a simple game but involves multiple reinforcers. All of these operate on a variable ratio schedule. Reinforcers include:

Positive Reinforcers

Gaining points and clearing rows
High Score or improving on a previous score
Fitting blocks
Winning or advancing to the next level

Negative Reinforcers

Avoidance of losing
Build up of rows
Failure to beat an earlier high score

What Rewards Can a Video Game Provide?

Video games are only able to provide conditioned reinforcers, that is rewards that we have learned to treat as reinforcers. As discussed earlier conditioned reinforcers are learned from continued pairing with other existing reinforcers. One consequence of the way that conditioned reinforcers are learned is that not all players will have the same set of reinforcers. The set of reinforcers each player possesses will depend on their life experiences. For instance, one player may have been learned to enjoy solving puzzles, while another may find solving puzzles aversive. To have a broad appeal the designer needs to provide reinforcers that are likely to appeal to wide large segments of the population. Reinforcers such as gaining points, achieving high scores and solving simple problems are all likely to appeal to many people since in day to day life we are likely to have already learned to find these types of consequences rewarding.

Another consequence of the way in which conditioned reinforcers are acquired is that it is possible for the game designer to extend the set of reinforcers within the game itself. By following classical conditioning methodologies, the game designer can teach the player new reinforcers. For instance, many car racing simulators allow players to make many adjustments to their vehicle. Improved configurations will allow players to improve their laptimes. Many players will spend huge amounts of effort improving their car configurations and developing optimum configurations for each track. Spending large amounts of time configuring a car is not likely to considered an activity that many new players would find rewarding. However, winning races and achieving fast laptimes is rewarding. For many players, this repeated pairing of developing better cars configurations and achieving better laptimes will ultimately lead to their finding the production of an optimum car configuration reinforcing in its own right.

Traditional reinforcers in video games include the following:
- Clearing a level and advancing to the next
- Solving puzzles
- Achieving goals
- Winning
- Losing
- Gaining points
- Gaining powerups
- Acquisition of items
- Acquisition of skills, improving skills or stats
- Exploration
- Achieving a top score

This list is definitely not exhaustive.

Multiplayer games provide the designer with an additional set of social reinforcers. These reinforcers include:
- Status
- Recognition
- Friendship or companionship
- Team or group loyalty

Since social reinforcers are so important in real life, social reinforcers within a game can be extremely compelling. For many players, status and recognition become the overriding reinforcement and the raison d'etre for their continued involvement in the game. Social status among their peers within the game can for many players justify the days, weeks, months, and even years of effort required to achieve a high level or top ranking.

The Role of Conditioned Reinforcers in the Preference for "Clone" Designs

A frequent complaint in the games industry is the lack of original designs. Publishers and developers seem most comfortable using existing design formulas when designing a game. One hit title is certain to be followed by waves of games using the same basic game design elements. For instance, no sooner had Command and Conquer achieved a critical success for the RTS genre than the floodgates opened and hordes of "clone" RTS titles began to hit the shelves. The same happened in with the FPS genre after the success of the Doom franchise, and gamers were soon buried in hordes of Doom clones.
(Note: I was involved in a project that was cancelled due to the Doom phenomena. We had designed an RPG using a third-person isometric perspective, and the licensor decided late in the project that they had to have a title with a first-person perspective since that was what everyone else was doing.)

Conditioned reinforcers may play a part in this clone phenomena. A less innovative title may be initially more attractive to the player since they have already been conditioned to find the "rewards" of the game reinforcing. With a new game design, there is a danger that the rewards of playing will initially not be reinforcing, and the developer will need to make an effort within the game to establish their reinforcement value. The danger is that the player will abandon the game before the reinforcers can be established. Consider for example a traditional RTS. The end goal and reward is to win battles and eventually beat the computer opponent. To achieve this goal, the player must spend much time locating and obtaining resources and building units. There is a danger that a player will initially find these activities unrewarding or even tedious. They may be unwilling to go through the initial efforts required to win a battle and may abandon the game. To a player who has already played other RTS titles, the activity of locating resources and building units will have already been established as conditioned reinforcers.

Which is the Best Reinforcement Schedule?

Figure 8 compares the response rates produced by the Variable ratio, the Fixed ratio, the Variable interval, and the Fixed Interval schedules of reinforcement. As can be seen, the number of responses per time period increases as the schedule is changed from a Fixed interval to a Variable interval, and from Fixed ratio to Variable ratio.

If the objective is to generate the highest level of responses over the longest period, a ratio schedule is the way to go. Additionally, the Variable ratio schedule demonstrates the highest resistance to extinction of any reinforcement schedule, which means that it is possible to dramatically increase the time between reinforcements without a drop-off in response rate. Thus by implementing a Variable ratio schedule of reinforcement, the game designer can condition players to keep responding over very long intervals without receiving any reinforcement. Not suprisingly, games that are viewed as extremely compulsive or addictive are more than likely implementing a Variable ratio schedule of reinforcement. It does not, however, follow that ratio schedules are the best reinforcement schedule, and there may be compelling reasons to use another reinforcement schedule.

While ratio schedules produce the highest response rate, this fact does not necessarily equate to ratio schedules as being the most fun for the player. Employees in factories are often paid based on ratio schedules, receiving pay based on the number of units produced. Factory owners like these schedules because they produce a high level of output. Employees, however, find these schedules undesirable because they make them work too hard, leaving them nervous and exhausted at the end of the day. Union pressure has often led to the replacement of ratio schedules with an hourly wage system, a duration schedule. Overuse of a variable ratio schedule in a game may leave your players feeling burned out, exhausted, and unhappy with the game experience, even though they feel compelled to keep playing. This reaction is not what a game designer wants from their players. The designer may wish to utilize other reinforcement schedules such as interval schedules that will still keep the player motivated but leave the player less burned out at the end of the session.

Other reinforcement schedules are particularly suited for certain situations. While Variable ratio schedules are best at maintaining a behavior, Fixed ratio and Duration schedules are the best schedules for establishing new behaviors. Experimenters frequently use Fixed ratio schedules when initially establishing a behavior, and only later switch to a Variable ratio schedule. Likewise, in many video games, the designer first needs to teach or train the player on how to play the game, and for this purpose Fixed ratio and Duration schedules are best.

Note: Experiments with pigeons pecking a lighted key for food indicate that they also have an aversive reaction to ratio schedules. If they are given another key to peck which has the sole consequence of turning off the light on the ratio key for a period, they will reliably do so to escape from the aversiveness of the schedule, even though there is no other source of food. While you might think that they could gain respite by simply ignoring the ratio key or by pecking at a moderate rate, they appear unable to do so. While the key light is lit, they are compelled to respond and will conform to the behavior pattern predicted by a ratio schedule.

Most games will have multiple schedules of reinforcement and different types of reward. For instance, the game may allow the player to gain experience points, to improve skills, and to solve puzzles, each of which requires the player to perform a different behavior. If the player is able to choose their activities during a session, employing variable ratio, or variable interval schedules (which typically produce no pause between reinforcement intervals) may have the consequence of locking the player into one activity for the duration of the session. This locking in will result in the player's failing to experience the many features that the game provides. In this situation, the designer may wish to employ fixed ratio or fixed interval schedules since they produce a post-reinforcement pauses, increasing the chances that a player will try out multiple aspects of the game.

Examples of the Various Reinforcement Schedules Implemented in Video Games.

Fixed Ratio (FR) - A reinforcer is given after a specified number of correct responses.

Examples:
- Collecting tokens. Many games require the player to collect a fixed number of tokens to advance to the next level, to obtain a new life point, or to receive some other reinforcer.
- Attaining a new level in an RPG. Some RPG's clearly indicate how much experience is required to achieve the next level. A high degree of certainty as to the level of work that will be required to achieve the next level puts the player on a fixed ratio schedule.
Variable Ratio (VR) - A reinforcer is given after a specified number of correct responses.

Examples:
- Collecting tokens. Some games require the player to collect tokens to achieve a life point or reward but vary the number of tokens required.
- Achieving a new level in an RPG. Some RPG's give no clear indication of how much experience is required to achieve the next level. The puts the player on a variable ratio schedule.
- Obtaining frags in an FPS. Many of your shots will fail, and it is not certain what the outcome of any particular shot will be. However, the more targets you shoot at, the more likely you are to get a frag.
- Crafting in an RPG. It may take multiple attempts to succeed or to gain a new level, but the more you try, the more likely your behavior is to be reinforced.

Premack Principle

Making a high frequency event contingent on the performance of a low frequency event can be used to reinforce the low frequency event. This approach is known as the Premack Principle. For instance, say in a video game players are made to frequently engage in activity A but to rarely engage in activity B. Making activity A contingent upon performing activity B will reinforce activity B. It is not entirely clear why the Premack Principle works, but it does form a useful method for expanding the range of reinforcers that can be used to alter behavior.

Fixed Interval (FI) - The first response after a fixed time interval is reinforced.

Examples:
- Waiting for monsters to re-spawn in a game where re-spawning occurs at fixed intervals. Note: In multiplayer games other players may also be waiting for the monster to re-spawn as well, in which case there is a fixed interval, limited hold schedule (see below).
- Obtaining objects, treasures, or power-ups that only re-appear at fixed intervals.

Variable Interval (VI) - The first response after a variable time interval is reinforced.

Examples:
- Waiting for monsters to re-spawn in a game where re-spawning occurs at variable intervals. Note: In multiplayer games other players may be also be waiting for the monster to re-spawn as well, in which case there is a variable interval, limited hold schedule (see below).
- Checking for mail messages from other players.

Fixed Interval Limited Hold (FI-LH) - The first response after a fixed interval of time is reinforced, providing the response occurs within a set period at the end of the interval.

Examples:
- Waiting for a monster to re-spawn on a crowded server. If monsters re-spawn at regular intervals the player must wait for a fixed period of time to kill the monster. However, if they are not there at the time the monster re-spawns, the monster is likely to be killed by another player.
- Obtaining objects, treasures, or power-ups that only appear for a limited time at fixed intervals.
Variable Interval Limited Hold (VI-LH) - The first response after a variable interval of time is reinforced, providing the response occurs within a set period at the end of the interval.

Examples:
- Waiting for a monster to re-spawn on a crowded server. If monsters re-spawn at variable intervals the player must wait for a fixed period of time to kill the monster. However, if they are not there at the time the monster re-spawns, the monster is likely to be killed by another player.
- Obtaining objects, treasures, or power-ups that only appear for a limited time at random times.
Fixed Duration (FD) - To be reinforced, the behavior must occur continuously throughout a fixed time interval.

Examples:
- Game tutorials. The player may be required to perform a behavior for a fixed period of time to complete a tutorial and advance. Fixed duration schedules are particularly suited for teaching players the mechanics of a game.
- Games with a fixed time limit for a level. In order to advance, the player must continuously perform an activity throughout the period; e.g., shooting alien ships or clearing all items from the level.

Variable Duration (VD) - To be reinforced, the behavior must occur continuously throughout a variable time interval.

Examples:
- Hunting simulations. Where the player is required to stalk prey, they are on a variable duration schedule. There is a variable period required to succeed, and the player must continue to stalk throughout the whole period.
- Flight simulators. The player must fly the plane throughout the period to successfully land the plane or complete a mission.
- Racing games. The player must drive the vehicle for the entire race to win.

Superstitious Behaviors

The behavioral psychologist B.F. Skinner performed one experiment on pigeons where at the end of a fifteen-second interval a reward was given regardless of what the pigeon was doing at the time. Six of the eight birds developed consistent responses: One bird made counterclockwise turns between reinforcers; another made pecking movements to the floor. Skinner concluded that reinforcement could act in an automatic manner, strengthening any behavior that occurred in close proximity in time, even though it had nothing to do with producing the reinforcer. In other words, the behavior had been adventitiously or accidentally reinforced. Skinner termed these 'superstitious behaviors.' Examples of superstitious behaviors are frequent in Video Games. Game mechanics are often not explained to the player, and it is left to them to discover a pattern through trial-and-error. This process of discover often leads to erroneous, or superstitious beliefs as to how the game works. For instance, in the MMORPG A Tale in the Desert, a crop's yield might differ based on environmental factors, but the mechanics were not explained. Players were left to theorize on where to plant a crop to achieve the best yield, taking into account factors such as altitude, distance from water, season, and time of day. Some of the theories advanced were quite complex, and also quite wrong. These superstitions were eventually disproved through trial-and-error, but some superstitions do not easily lend themselves to proof and can endure. One such enduring superstition in Island of Kesmai, one of the earlier MMORPGs, was that patting a dog in the town was lucky.

Case Study - Reward Systems in Typical MMORPGs

The typical system of advancement in a role-playing game, from the earliest pen-and-paper games, such as Dungeons and Dragons, up to today's MMORPG such as Everquest and Dark Age of Camelot, is that players' characters advance through levels and gain skills. To obtain levels, characters must gain experience points that are typically gained through slaying monsters or obtaining treasure. Skills are usually obtained through practice, each successful use of a skill increasing the chance that the player will get better at that skill. In addition to slaying monsters and improving skills, players can win objects and treasures, learn new spells explore, make friends, and achieve status.

The various rewards often interrelate. For instance:

A player may have to obtain a particular level in order to learn a new skill or spell.
Better spells, armor and weapons let the player slay monsters and win experience more easily.
Better skills allow the player to slay tougher monsters, make better weapons, earn money, etc.
At higher levels a player can survive while exploring more dangerous areas.
At higher levels a player can slay tougher monsters, providing more experience points and better treasure.
Obtaining rare or sought after items and achieving higher levels improves the players' status in the game.
To maintain friendships, a player must advance in level at the same rate as their friends; otherwise, they will be left behind and not able to survive fighting the monsters that their friends are taking on.

For many players it is the achievement of levels and the resulting increase in status that become the overriding reinforcement and the reason they keep playing. If achieving new levels were easy, their value as a reinforcer would be reduced. Players would quickly be able to achieve the highest level, and the status obtained by achieving a high level would be reduced. Designers have hence implemented the level system so that achieving each successive level requires more effort, and achieving the highest levels requires huge amounts of effort.

In Everquest, while the player might be able to advance from first to fourth level in an few hours of play, it may take days of play to go from fifth to tenth level. Getting to the highest levels may take months or even years of effort.

Exponential increases in the time it takes to reach the next level have these advantages:

They assure that it takes the player more time to reach the highest levels and thus encourages them to maintain their subscription to the game.
They assure that high levels will be scarce and thus a strong reinforcer to those players seeking status.

However, rapid increase in the effort required to receive the reinforcement of achieving a new level causes ratio strain. Ratio strain involves the risk that a player will at some point find the effort involved in achieving a new level disproportionate to the reward itself and may simply give up the effort and abandon the game. It is not suprising that designers have selected a ratio schedule of reinforcement for level advancement since these schedules have the least succeptibility to ratio strain. Designers also usually make it uncertain exactly how much effort will be required to achieve the next level, placing level advancement on a variable ratio schedule, a schedule that is the shows the most resistance to extinction, that produces no post-reinforcement pause, and that is the most effective at producing high levels of response over extended periods.

Note: Dark Ages of Camelot (DAOC) provides an experience bar that provides a clear indication of when the player will achieve the next level. This experience bar places the skill system on a fixed ratio schedule. While this ratio still produces a high level of response between reinforcements and has a high resistance to extinction, it is more susceptible to ratio strain, meaning that players are more likely to stop playing at each successive level than they would be with a variable ratio schedule. Another feature of a fixed ratio schedule is that it produces a post-reinforcement pause. When player's achieve the next level in DAOC, they will likely pause for a time before trying to gain the next level.

In addition to increasing the experience required to achieve each level, other devices are often used to make it more difficult to achieve the next level. For instance:

A player may only get experience for killing monsters that are near their level or higher so players must seek out tougher monsters to obtain experience.
While in earlier levels a player may be able to take on a monster that is near their level on their own, at higher levels it may take a group of players to beat such an opponent, therefore at higher levels the player will have to find a group of other players to adventure with.
While playing as a group is typically more fun, it often takes time to find a compatible group of players to band with. Not only do they need to be near the same level as you, they also need to have complementary class types. For instance it may be desirable to have fighters, healers and wizards in the party. The logistics of finding groups can thus be complex and time-onsuming.
When fighting as a group you share experience, meaning that you get less experience for slaying a monster. The more party members in the group, the less experience you get.

Ethical Considerations of Using Operant Conditioning in Video Games

Behavior Shaping

Conditioning cannot only increase the frequency of existing behaviors, it can also program the subject to perform entirely new behaviors. Conditioning can further be used to increase the range of results that the subject finds reinforcing and can even cause the subject to continue responding even though they find the reinforcement behavior aversive. With this point in mind, it is a concern whether it is ethical to use these techniques in a video game. In defense of these techniques a few points should be made:

Conditioning is part of the natural way in which we learn. Sports, board games, and all other recreational activities involve some element of operant or classical conditioning. Video games are no different and will involve some element of conditioning. It is far better to understand these forces and to use them to create a better gaming experience.
Except in some more extreme cases (see discussion below on Game Addiction) players are exercising their own free will in playing a video game. If they find a game aversive, they can escape from it by turning the game off.

Game Addiction

Some video games are often cited as being too addictive. Everquest, for instance, has gained the nickname "Evercrack," due to its perceived addictiveness. Some play Everquest or other games for days on end without break. Many players of MPORPGs play for three to four hours a day over periods of weeks, months, or even years. The results on a few of the more obsessed individuals have been devastating. A number of players have lost their jobs or destroyed their marriages from playing too much of a particular game. There have even been a handful of deaths thought to be associated with these games.

Matching Law and "Power Playing"

As stated earlier, matching law holds that organisms choose one reinforcement schedule over another in direct proportion to the frequency, magnitude, or delay in reinforcement of reinforcers for each schedule. Many lament the player who focuses only on those activities that produce the shortest route to obtaining their objective. We refer to the players as "power gamers," and their playing style is sometimes viewed as in some way cheating the system. However, considering matching law, it is only natural that players will proportionally seek out those activities that lead to the rewards they find greatest, quickest, or most immediate. What separates power gamers from other players may be a better understanding by the former on the optimum path to the optimum rewards, or alternatively that non-power players simply find other aspects of the game rewarding, like socializing or exploring.

Whichever views on the nature of addiction are correct, it is clear that some individuals spend too much time playing some types of games. Though the root cause of their addiction may not be the video game itself, there may be some aspects in the nature of the game that lead to excessive play. Some operators of these games have admirably provided tools to help the player control the time they spend in the game, such as timers and warning systems.

As mentioned earlier, overuse of Variable ratio schedules and long intervals between reinforcement conditions players to play for extended periods of time without pause. Designers of MMORPGs (whether they realize it or not) often rely heavily on these schedules to keep players interested in the game so that they continue playing the game and maintain their subscription. While the motivation in using these schedules may be to maintain long-term interest in the game, it does have the result that some players play obsessively. This outcome may not be desirable for either the player or the company operating the game. From the game company's standpoint, while they benefit from having a player maintain their subscription month after month, the more hours the player plays during each month, the greater the required cost in bandwidth and server power to operate the game. There are thus both economic and ethical considerations involved in the overuse of variable ratio schedules. Other reinforcement schedules do offer some alternatives. Fixed schedules introduce post-reinforcement pauses, while interval and duration schedules elicit a slower rate of response.

Can a Game Be Too compelling?

In his sequel to Red Dwarf, Grant Naylor envisaged the ultimate computer game: Better than Life. Better than Life transports the player to a perfect virtual reality world of their own imagination where they can enjoy fabulous wealth and unmitigated success. It's the ideal game with only one drawback — it's so good that no-one has ever walked away from it alive.

Selected Bibliography

Schwartz, B., Robbins, S.J (1995). Psychology of Learning and Behavior, New York, W. W. Norton.
Kazdin, A.E. (1989). Behavior Modification in Applied Settings, Belmont, Brooks/Cole.
Martin, G., Pear, J., (1992). Behavior Modification, New Jersey, Prentice Hall.
Dodes, L. (2003). The Heart of Addiction, New York, Quill.

<<Previous | 1 | 2