Section II
Implementing Reward Schedules in Video Games
Practical Issues of Implementing a Reward System Within a Video Game
When analyzing a game design from an operant conditioning standpoint,
it is necessary to determine:
- What schedule of reinforcement is being applied?
- What are the rewards?
- What behaviors are being reinforced?
In some games this process is straightforward. For instance, it is
clear that video poker follows a Variable ratio schedule. In other games,
unraveling the reward structures is much more difficult. For instance,
different players may find different aspects of a game reinforcing. For
example, some Tetris players may find reinforcement in fitting blocks perfectly.
Another Tetris player may find only the clearing of rows reinforcing.
Both players are on different variable ratio schedules.
The matter of implementing reward systems is further complicated by the
way secondary reinforcers are learned. The player who enjoys clearing
rows will likely learn to enjoy fitting blocks and will now be operating
on two ratio schedules. Not only may new reinforcers be learned, but the
overriding motivations for playing the game may change. For instance new
players may be motivated by the chance to explore the game and test out
the features it has to offer. Once they have exhausted their initial curiosity,
they may switch to trying to accumulate points. Still later, they may play
to beat other player's scores.
A further complication is that it is not only the reward system that
may change, but also the type of reinforcement schedule used to change
as the game progresses. For instance, a reward may initially be provided
on a fixed ratio schedule but as the game advances the schedule may switch
to a variable ratio. An analysis of a game designs reward systems needs
to look at the game both from the point of view of a new player and from
the perspective of an experienced player.
|
Reinforcement Schedules Operating in Tetris.
Tetris is a simple game but involves multiple
reinforcers. All of these operate on a variable ratio schedule.
Reinforcers include:
Positive Reinforcers
- Gaining points and clearing rows
- High Score or improving on a previous
score
- Fitting blocks
- Winning or advancing to the next level
Negative Reinforcers
- Avoidance of losing
- Build up of rows
- Failure to beat an earlier high score
|
|
What Rewards Can a Video Game Provide?
Video games are only able to provide conditioned reinforcers, that is
rewards that we have learned to treat as reinforcers. As discussed earlier
conditioned reinforcers are learned from continued pairing with other
existing reinforcers. One consequence of the way that conditioned reinforcers
are learned is that not all players will have the same set of reinforcers.
The set of reinforcers each player possesses will depend on their life
experiences. For instance, one player may have been learned to enjoy solving
puzzles, while another may find solving puzzles aversive. To have a broad
appeal the designer needs to provide reinforcers that are likely to appeal
to wide large segments of the population. Reinforcers such as gaining
points, achieving high scores and solving simple problems are all likely
to appeal to many people since in day to day life we are likely to have
already learned to find these types of consequences rewarding.
Another consequence of the way in which conditioned reinforcers are acquired
is that it is possible for the game designer to extend the set of reinforcers within the game itself.
By following classical conditioning methodologies, the game designer can teach
the player new reinforcers. For instance, many car racing simulators allow
players to make many adjustments to their vehicle. Improved configurations
will allow players to improve their laptimes. Many players will spend
huge amounts of effort improving their car configurations and developing
optimum configurations for each track. Spending large amounts of time
configuring a car is not likely to considered an activity that many
new players would find rewarding. However, winning races and achieving
fast laptimes is rewarding. For many players, this repeated pairing of
developing better cars configurations and achieving better laptimes will
ultimately lead to their finding the production of an optimum car configuration
reinforcing in its own right.
Traditional reinforcers in video games include the following:
- Clearing a level and advancing to the next
- Solving puzzles
- Achieving goals
- Winning
- Losing
- Gaining points
- Gaining powerups
- Acquisition of items
- Acquisition of skills, improving skills or stats
- Exploration
- Achieving a top score
This list is definitely not exhaustive.
Multiplayer games provide the designer with an additional set of social
reinforcers. These reinforcers include:
- Status
- Recognition
- Friendship or companionship
- Team or group loyalty
Since social reinforcers are so important in real life, social reinforcers
within a game can be extremely compelling. For many players, status and
recognition become the overriding reinforcement and the raison d'etre
for their continued involvement in the game. Social status among their
peers within the game can for many players justify the days, weeks, months,
and even years of effort required to achieve a high level or top ranking.
|
The Role of Conditioned Reinforcers in the
Preference for "Clone" Designs
A frequent complaint in the games industry
is the lack of original designs. Publishers and developers seem
most comfortable using existing design formulas when designing a
game. One hit title is certain to be followed by waves of games
using the same basic game design elements. For instance, no sooner
had Command and Conquer achieved a critical success for the RTS
genre than the floodgates opened and hordes of "clone"
RTS titles began to hit the shelves. The same happened in with the
FPS genre after the success of the Doom franchise, and gamers were
soon buried in hordes of Doom clones.
(Note: I was involved in a project that was cancelled due to the
Doom phenomena. We had designed an RPG using a third-person isometric
perspective, and the licensor decided late in the project that they
had to have a title with a first-person perspective since that was
what everyone else was doing.)
Conditioned reinforcers may play a part in
this clone phenomena. A less innovative title may be initially more
attractive to the player since they have already been conditioned
to find the "rewards" of the game reinforcing.
With a new game design, there is a danger
that the rewards of playing will initially not be reinforcing, and
the developer will need to make an effort within the game to establish
their reinforcement value. The danger is that the player will abandon
the game before the reinforcers can be established. Consider for
example a traditional RTS. The end goal and reward is to win battles
and eventually beat the computer opponent. To achieve this goal,
the player must spend much time locating and obtaining resources
and building units. There is a danger that a player will initially
find these activities unrewarding or even tedious. They may be unwilling
to go through the initial efforts required to win a battle and may
abandon the game. To a player who has already played other RTS titles,
the activity of locating resources and building units will have
already been established as conditioned reinforcers.
|
|
Which is the Best Reinforcement Schedule?
Figure 8 compares the response rates produced by the Variable ratio, the Fixed ratio, the Variable interval, and the Fixed Interval schedules of reinforcement. As can be seen, the number of responses per time period increases as the schedule is changed from a Fixed interval to a Variable interval, and from Fixed ratio to Variable ratio.
If the objective is to generate the highest level of responses over the longest period, a ratio schedule is the way to go. Additionally, the Variable ratio schedule demonstrates the highest resistance to extinction of any reinforcement schedule, which means that it is possible to dramatically increase the time between reinforcements without a drop-off in response rate. Thus by implementing a Variable ratio schedule of reinforcement, the game designer can condition players to keep responding over very long intervals without receiving any reinforcement. Not suprisingly, games that are viewed as extremely compulsive or addictive are more than likely implementing a Variable ratio schedule of reinforcement. It does not, however, follow that ratio schedules are the best reinforcement schedule, and there may be compelling reasons to use another reinforcement schedule. |
While ratio schedules produce the highest response rate, this fact does not
necessarily equate to ratio schedules as being the most fun for the player.
Employees in factories are often paid based on ratio schedules, receiving
pay based on the number of units produced. Factory owners like these schedules
because they produce a high level of output. Employees, however, find
these schedules undesirable because they make them work too hard, leaving
them nervous and exhausted at the end of the day. Union pressure has often
led to the replacement of ratio schedules with an hourly wage system,
a duration schedule. Overuse of a variable ratio schedule in a game may
leave your players feeling burned out, exhausted, and unhappy with the
game experience, even though they feel compelled to keep playing. This
reaction is not what a game designer wants from their players. The designer
may wish to utilize other reinforcement schedules such as interval schedules
that will still keep the player motivated but leave the player less
burned out at the end of the session.
Other reinforcement schedules are particularly suited for certain situations.
While Variable ratio schedules are best at maintaining a behavior, Fixed
ratio and Duration schedules are the best schedules for establishing new
behaviors. Experimenters frequently use Fixed ratio schedules when initially
establishing a behavior, and only later switch to a Variable ratio schedule.
Likewise, in many video games, the designer first needs to teach or train
the player on how to play the game, and for this purpose Fixed ratio and
Duration schedules are best.
|
Note: Experiments with pigeons pecking a
lighted key for food indicate that they also have an aversive reaction
to ratio schedules. If they are given another key to peck which has
the sole consequence of turning off the light on the ratio key for
a period, they will reliably do so to escape from the aversiveness
of the schedule, even though there is no other source of food. While
you might think that they could gain respite by simply ignoring the
ratio key or by pecking at a moderate rate, they appear unable to
do so. While the key light is lit, they are compelled to respond and
will conform to the behavior pattern predicted by a ratio schedule.
|
|
Most games will have multiple schedules of reinforcement and different types
of reward. For instance, the game may allow the player to gain experience points,
to improve skills, and to solve puzzles, each of which requires the player to perform
a different behavior. If the player is able to choose their activities during a
session, employing variable ratio, or variable interval schedules (which typically
produce no pause between reinforcement intervals) may have the consequence of
locking the player into one activity for the duration of the session. This locking in will
result in the player's failing to experience the many features that the game
provides. In this situation, the designer may wish to employ fixed ratio or fixed
interval schedules since they produce a post-reinforcement pauses,
increasing the chances that a player will try out multiple aspects of the game.
Examples of the Various Reinforcement Schedules Implemented in Video
Games.
- Fixed
Ratio (FR) - A reinforcer is given after a specified number of correct
responses.
Examples:
- Collecting tokens. Many games require the player to collect a
fixed number of tokens to advance to the next level, to obtain a new
life point, or to receive some other reinforcer.
- Attaining a new level in an RPG. Some RPG's clearly indicate how
much experience is required to achieve the next level. A high degree
of certainty as to the level of work that will be required to achieve
the next level puts the player on a fixed ratio schedule.
- Variable
Ratio (VR) - A reinforcer is given after a specified number of correct
responses.
Examples:
- Collecting tokens. Some games require the player to collect tokens
to achieve a life point or reward but vary the number of tokens
required.
- Achieving a new level in an RPG. Some RPG's give no clear indication
of how much experience is required to achieve the next level. The
puts the player on a variable ratio schedule.
- Obtaining frags in an FPS. Many of your shots will fail, and it
is not certain what the outcome of any particular shot will be.
However, the more targets you shoot at, the more likely you are
to get a frag.
- Crafting in an RPG. It may take multiple attempts to succeed
or to gain a new level, but the more you try, the more likely your
behavior is to be reinforced.
|
Premack Principle
Making a high frequency event contingent
on the performance of a low frequency event can be used to reinforce
the low frequency event. This approach is known as the Premack Principle.
For instance, say in a video game players are made to frequently engage
in activity A but to rarely engage in activity B. Making activity A contingent
upon performing activity B will reinforce activity B. It is not
entirely clear why the Premack Principle works, but it does form
a useful method for expanding the range of reinforcers that can
be used to alter behavior.
|
|
- Fixed
Interval (FI) - The first response after a fixed time interval is
reinforced.
Examples:
- Waiting for monsters to re-spawn in a game where re-spawning occurs
at fixed intervals. Note: In multiplayer games other players may also
be waiting for the monster to re-spawn as well, in which case there
is a fixed interval, limited hold schedule (see below).
- Obtaining objects, treasures, or power-ups that only re-appear at
fixed intervals.
- Variable
Interval (VI) - The first response after a variable time interval
is reinforced.
Examples:
- Waiting for monsters to re-spawn in a game where re-spawning occurs
at variable intervals. Note: In multiplayer games other players may
be also be waiting for the monster to re-spawn as well, in which case there
is a variable interval, limited hold schedule (see below).
- Checking for mail messages from other players.
- Fixed
Interval Limited Hold (FI-LH) - The first response after a fixed
interval of time is reinforced, providing the response occurs within
a set period at the end of the interval.
Examples:
- Waiting for a monster to re-spawn on a crowded server. If monsters
re-spawn at regular intervals the player must wait for a fixed period
of time to kill the monster. However, if they are not there at the
time the monster re-spawns, the monster is likely to be killed by
another player.
- Obtaining objects, treasures, or power-ups that only appear for
a limited time at fixed intervals.
- Variable
Interval Limited Hold (VI-LH) - The first response after a variable
interval of time is reinforced, providing the response occurs within
a set period at the end of the interval.
Examples:
- Waiting for a monster to re-spawn on a crowded server. If monsters
re-spawn at variable intervals the player must wait for a fixed
period of time to kill the monster. However, if they are not there
at the time the monster re-spawns, the monster is likely to be killed
by another player.
- Obtaining objects, treasures, or power-ups that only appear for
a limited time at random times.
- Fixed
Duration (FD) - To be reinforced, the behavior must occur continuously
throughout a fixed time interval.
Examples:
- Game tutorials. The player may be required to perform a behavior
for a fixed period of time to complete a tutorial and advance. Fixed
duration schedules are particularly suited for teaching players the
mechanics of a game.
- Games with a fixed time limit for a level. In order to advance,
the player must continuously perform an activity throughout the period;
e.g., shooting alien ships or clearing all items from the level.
- Variable
Duration (VD) - To be reinforced, the behavior must occur continuously
throughout a variable time interval.
Examples:
- Hunting simulations. Where the player is required to stalk prey,
they are on a variable duration schedule. There is a variable period
required to succeed, and the player must continue to stalk throughout
the whole period.
- Flight simulators. The player must fly the plane throughout the
period to successfully land the plane or complete a mission.
- Racing games. The player must drive the vehicle for the entire race
to win.
|
Superstitious Behaviors
The behavioral psychologist B.F. Skinner
performed one experiment on pigeons where at the end of a fifteen-second
interval a reward was given regardless of what the pigeon was doing
at the time. Six of the eight birds developed consistent responses:
One bird made counterclockwise turns between reinforcers; another
made pecking movements to the floor. Skinner concluded that reinforcement
could act in an automatic manner, strengthening any behavior that
occurred in close proximity in time, even though it had nothing
to do with producing the reinforcer. In other words, the behavior
had been adventitiously or accidentally reinforced. Skinner termed
these 'superstitious behaviors.' Examples of superstitious behaviors
are frequent in Video Games. Game mechanics are often not explained
to the player, and it is left to them to discover a pattern through
trial-and-error. This process of discover often leads to erroneous, or superstitious
beliefs as to how the game works. For instance, in the MMORPG A
Tale in the Desert, a crop's yield might differ based on environmental
factors, but the mechanics were not explained. Players were left
to theorize on where to plant a crop to achieve the best yield,
taking into account factors such as altitude, distance from water,
season, and time of day. Some of the theories advanced were quite
complex, and also quite wrong. These superstitions were eventually
disproved through trial-and-error, but some superstitions do not easily
lend themselves to proof and can endure. One such enduring
superstition in Island of Kesmai, one of the earlier MMORPGs,
was that patting a dog in the town was lucky.
|
|
Case Study - Reward Systems in Typical MMORPGs
The typical system of advancement in a role-playing game, from the earliest
pen-and-paper games, such as Dungeons and Dragons, up to today's MMORPG such
as Everquest and Dark Age of Camelot, is that players' characters advance through
levels and gain skills. To obtain levels, characters must gain experience points
that are typically gained through slaying monsters or obtaining treasure. Skills
are usually obtained through practice, each successful use of a skill increasing
the chance that the player will get better at that skill. In addition to slaying
monsters and improving skills, players can win objects and treasures, learn
new spells explore, make friends, and achieve status.
The various rewards often interrelate. For instance:
- A player may have to obtain a particular level in order to learn a new
skill or spell.
- Better spells, armor and weapons let the player slay monsters and win experience
more easily.
- Better skills allow the player to slay tougher monsters, make better weapons,
earn money, etc.
- At higher levels a player can survive while exploring more dangerous areas.
- At higher levels a player can slay tougher monsters, providing more
experience points and better treasure.
- Obtaining rare or sought after items and achieving higher levels improves
the players' status in the game.
- To maintain friendships, a player must advance in level at the same rate
as their friends; otherwise, they will be left behind and not able to survive
fighting the monsters that their friends are taking on.
For many players it is
the achievement of levels and the resulting increase in status that become
the overriding reinforcement and the reason they keep playing. If achieving
new levels were easy, their value as a reinforcer would be reduced. Players
would quickly be able to achieve the highest level, and the status obtained
by achieving a high level would be reduced. Designers have hence implemented
the level system so that achieving each successive level requires more effort,
and achieving the highest levels requires huge amounts of effort.
In Everquest,
while the player might be able to advance from first to fourth level in an few
hours of play, it may take days of play to go from fifth to tenth level.
Getting to the highest levels may take months or even years of effort.
Exponential increases in the time it takes to reach the next level have
these advantages:
- They assure that it takes the player more time to reach the highest
levels and thus encourages them to maintain their subscription to the
game.
- They assure that high levels will be scarce and thus a strong reinforcer
to those players seeking status.
However, rapid increase in the effort required to receive the reinforcement
of achieving a new level causes ratio strain. Ratio strain involves the
risk that a player will at some point find the effort involved in achieving
a new level disproportionate to the reward itself and may simply give
up the effort and abandon the game. It is not suprising that designers
have selected a ratio schedule of reinforcement for level advancement
since these schedules have the least succeptibility to ratio strain. Designers
also usually make it uncertain exactly how much effort will be required
to achieve the next level, placing level advancement on a variable ratio
schedule, a schedule that is the shows the most resistance to extinction, that
produces no post-reinforcement pause, and that is the most effective
at producing high levels of response over extended periods.
|
Note: Dark Ages of Camelot (DAOC) provides
an experience bar that provides a clear indication of when the player
will achieve the next level. This experience bar places the skill system on a fixed
ratio schedule. While this ratio still produces a high level of response
between reinforcements and has a high resistance to extinction, it
is more susceptible to ratio strain, meaning that players are more
likely to stop playing at each successive level than they would be
with a variable ratio schedule. Another feature of a fixed ratio schedule
is that it produces a post-reinforcement pause. When player's achieve
the next level in DAOC, they will likely pause for a time before trying
to gain the next level. |
|
In addition to increasing the experience required to achieve each level, other
devices are often used to make it more difficult to achieve the next level.
For instance:
- A player may only get experience for killing monsters that are near their level
or higher so players must seek out tougher monsters to obtain experience.
- While in earlier levels a player may be able to take on a monster that is near
their level on their own, at higher levels it may take a group of players to
beat such an opponent, therefore at higher levels the player will have to find
a group of other players to adventure with.
- While playing as a group is typically more fun, it often takes time to find
a compatible group of players to band with. Not only do they need to be near
the same level as you, they also need to have complementary class types. For
instance it may be desirable to have fighters, healers and wizards in the
party. The logistics of finding groups can thus be complex and time-onsuming.
- When fighting as a group you share experience, meaning that you get less
experience for slaying a monster. The more party members in the group, the
less experience you get.
Ethical Considerations of Using Operant Conditioning in Video Games
Behavior Shaping
Conditioning cannot only increase the frequency of existing behaviors,
it can also program the subject to perform entirely new behaviors. Conditioning
can further be used to increase the range of results that the subject
finds reinforcing and can even cause the subject to continue responding
even though they find the reinforcement behavior aversive. With this point in
mind, it is a concern whether it is ethical to use these techniques in
a video game. In defense of these techniques a few points should be made:
- Conditioning is part of the natural way in which we learn. Sports,
board games, and all other recreational activities involve some element
of operant or classical conditioning. Video games are no different and will involve some element of conditioning. It is far better to understand these forces and to use them to
create a better gaming experience.
- Except in some more extreme cases (see discussion below on Game Addiction)
players are exercising their own free will in playing a video game.
If they find a game aversive, they can escape from it by turning the
game off.
Game Addiction
Some video games are often cited as being too addictive. Everquest, for instance,
has gained the nickname "Evercrack," due to its perceived addictiveness.
Some play Everquest or other games for days on end without break. Many players
of MPORPGs play for three to four hours a day over periods of weeks, months,
or even years. The results on a few of the more obsessed individuals have
been devastating. A number of players have lost their jobs or destroyed
their marriages from playing too much of a particular game. There have even
been a handful of deaths thought to be associated with these games. |
Matching Law and "Power Playing"
As stated earlier, matching law holds that
organisms choose one reinforcement schedule over another in direct
proportion to the frequency, magnitude, or delay in reinforcement
of reinforcers for each schedule. Many lament the player who focuses
only on those activities that produce the shortest route to obtaining
their objective. We refer to the players as "power gamers,"
and their playing style is sometimes viewed as in some way cheating
the system. However, considering matching law, it is only natural
that players will proportionally seek out those activities that
lead to the rewards they find greatest, quickest, or most
immediate. What separates power gamers from other players may be
a better understanding by the former on the optimum path to the
optimum rewards, or alternatively that non-power players simply
find other aspects of the game rewarding, like socializing or exploring.
|
|
Whichever
views on the nature of addiction are correct, it is clear that some individuals
spend too much time playing some types of games. Though the root cause
of their addiction may not be the video game itself, there may be some
aspects in the nature of the game that lead to excessive play. Some operators
of these games have admirably provided tools to help the player control
the time they spend in the game, such as timers and warning systems.
As mentioned earlier, overuse of Variable ratio schedules and long intervals
between reinforcement conditions players to play for extended periods
of time without pause. Designers of MMORPGs (whether they realize it or
not) often rely heavily on these schedules to keep players interested
in the game so that they continue playing the game and maintain their
subscription. While the motivation in using these schedules may be to
maintain long-term interest in the game, it does have the result that
some players play obsessively. This outcome may not be desirable for
either the player or the company operating the game. From the game company's
standpoint, while they benefit from having a player maintain their subscription
month after month, the more hours the player plays during each month,
the greater the required cost in bandwidth and server power to operate
the game. There are thus both economic and ethical considerations involved
in the overuse of variable ratio schedules. Other reinforcement schedules
do offer some alternatives. Fixed schedules introduce post-reinforcement
pauses, while interval and duration schedules elicit a slower rate of
response.
|
Can a Game Be Too compelling?
In his sequel to Red Dwarf, Grant Naylor
envisaged the ultimate computer game: Better than Life. Better than
Life transports the player to a perfect virtual reality world of
their own imagination where they can enjoy fabulous wealth and unmitigated
success. It's the ideal game with only one drawback — it's so good
that no-one has ever walked away from it alive.
|
|
Selected Bibliography
Schwartz, B., Robbins, S.J (1995). Psychology of Learning and Behavior,
New York, W. W. Norton.
Kazdin, A.E. (1989). Behavior Modification in Applied Settings, Belmont,
Brooks/Cole.
Martin, G., Pear, J., (1992). Behavior Modification, New Jersey, Prentice
Hall.
Dodes, L. (2003). The Heart of Addiction, New York, Quill.
<<Previous | 1
| 2