by
I don't know who first drew a comparison between video games and a "Skinner box." I heard the term "Virtual Skinner Box" several years ago and have since seen the occasional reference to this term on various games design discussion forums. The term has been heavily used in recent years in relation to links between violence and video games, and in relation to video game addiction.
As anyone who has had any exposure to the study of psychology in school will probably know, a Skinner box is a piece of laboratory equipment used to conduct operant conditioning experiments with animals, usually a rat or a pigeon. In the Skinner box there is usually a lever or a key that the animal can manipulate to obtain a reward such as a food or water. Psychologists use Skinner boxes to study the effect of various schedules of reward or punishment on the animal's behavior. For instance, the box could be configured to deliver a reward to the animal every time it presses the lever, every hundred times, or on some irregular schedule. Psychologists would then measure how effective a particular reward schedule was on the animal's behavior. The inventor of the Skinner box was an American behavioral psychologist B.F. Skinner who also invented the term operant conditioning to describe his field of study. So why use the term "virtual Skinner box" to describe a video game? While the rich environment of a typical video game is far removed from that of a Skinner box, and it seems insulting to compare a game player's behavior to that of a rat, I do understand the sentiment behind the term. Many games require the performance of a repetitive task to achieve some goal, and I distinctly recall the image in my mind that I was not unlike a rat endlessly pressing on a lever hoping for a food pellet when, for instance, I was using the crafting skill in Dark Ages of Camelot, or pressing the button repeatedly in video poker. Given this perceived link between operant-conditioning and video game design, what does the research of behavioral psychologists' into operant conditioning have to teach us about structuring rewards within a video game? In this article I propose that over a century of study by behavioral psychologists into conditioning does hold important lessons for the game designer, that some popular titles are today implementing operant conditioning techniques, and that the use of such techniques in a game's design can make that game more enjoyable and can increase its longevity. I will further discuss the ethical considerations of using such techniques, especially in light of recent concerns about the addictiveness of video games. |
|
The basic principal of operant conditioning is simply that the frequency of a behavior will increase if it is rewarded, and that it will decrease if it is punished. For instance, a hungry rat in a Skinner box will at first act in a manner that is natural to a hungry rat; e.g., running around the cage, squeaking, trying to escape, etc. If while it is performing these activities, one response — in this case pressing a lever — leads to the reward of securing food, the rat will gradually learn that pressing the lever leads to the reward of food. The behavior will be repeated and thus learned. The behavior that results in the reward becomes especially important to the rat. The same process can be applied to an action that allows the rat to escape from or avoid unpleasant stimuli.
Another principal of operant conditioning is that once a behavior is learned, the frequency of the reward can be reduced. For the behavior to be learned, it may be necessary at first to reinforce every occurrence of the behavior. Once learned, the reinforcements can be provided on an intermittent basis, and over time it is possible to reduce the frequency of rewards and still maintain the behavior. For instance, the number of times the lever has to be pressed to achieve a reward can gradually be increased from each time, to every ten times, to every hundred times, and so on, or the lever may need to be pressed repeatedly for a set period of time to achieve a reward. Behavioral psychologists have spent much time experimenting on what effect various schedules of reward have on behavior. These reward schedules are of particular importance to the video game designer and are discussed in detail below.
A further principle of operant conditioning is that it is possible to condition an individual to perform behaviors outside of their usual repertoire. If a behavior is particularly complex, for instance, if it is an action that requires multiple steps or takes much skill to perform, it may be impossible to directly reinforce that behavior. Instead, it is possible to reinforce behaviors that approximate the desired behavior and through step-by-step reinforcement of successive approximations, gradually produce the desired response. This principle is known as "behavior shaping." For example, a video game may implement various levels of difficulty, each successive level requiring the player to perform a more complex set of actions to succeed.
Scheduling RewardsThe basic principal of operant conditioning — that you can increase the frequency of a behavior by rewarding it — is fairly simple, and is one that many game designers will have already discovered from trial-and-error or through common sense. The study of operant conditioning becomes more interesting when we look at how reward systems (or "reinforcers," to use the psychological term) can be structured to produce the greatest effect on a behavior. There is extensive research on how reinforcers can be most effectively scheduled. There are three types of schedules for reinforcers — continuous, extinction, and intermittent. With a continuous schedule, the behavior is reinforced each time it is performed. Extinction schedules are the opposite of continuous schedules in that no instance of the given behavior will be reinforced. Between these two extremes lie intermittent schedules, where only some of the instances of a behavior are reinforced. Intermittent schedules include:
All of these schedules can also be fixed or variable. In a fixed schedule, the reinforcement will occur after a set period of time, or after a fixed number of responses. In a variable schedule, the time or number of responses will vary around a particular number; for instance, a reinforcement will be given every ten to twenty times the behavior is performed. If we treat continuous and extinction schedules as just two extremes of a fixed ratio schedule, we are left with eight basic reinforcement schedules:
|
|
Each of these intermittent schedules has its own characteristics behavior patterns, making it suitable for different applications. Typically the Variable schedules of reinforcement are more effective at producing responses than Fixed schedules, and Ratio schedules are more effective at producing responses than Interval schedules. Of all the schedules, the Variable Ratio schedule is able to generate the highest level of responses over the longest period.
Does Conditioning Work on Humans?Writers of fiction were quick to grasp the potential of conditioning on human society. In Anthony Burgess' work, The Clockwork Orange, aversion therapy is used to 'cure' the protagonist Alex of his brutal sociopath behavior. Aldous Huxley's Brave New World imagines a Dystopian society where science has learned to shape and control human emotions through conditioning. Despite these fictional examples, the question remains whether these techniques — which have largely been used to conduct experiments on animals — work on humans. The answer is a qualified yes. Clinical psychologists and psychiatrists do frequently use operant conditioning techniques on humans. For instance, the technique is often used in the treatment of autism. A number of experiments have shown that humans show the same patterns of responding that other animals show when exposed to the basic schedules of reinforcement. The qualification comes due to the ability of humans to reason and verbalize rules that may prevent them from showing the same behavior as animals. For example, humans may alter their behavior based on what they think the experimenter wants to see, rather than in response to the actual reinforcement schedule. Another difference with humans is the ability to explain to them what type of schedule of reinforcement they are on. A number of studies indicate that people perform more efficiently on various schedules if they have specific rules to follow regarding the schedule in effect. This openness contrasts with many video games, where the mechanics of the reward system remain opaque to the player. If the intent of the game's reward system is to increase the frequency of a given behavior, clearly explaining to the player how the reward system works will make the reward system more effective. |
RewardsGeneral PrinciplesA reward is anything that increases the frequency of a behavior. This reward can be the presentation of a positive event following a response, or the removal of an aversive event. Likewise, punishment is something that decreases the frequency of a response and can take the form of the presentation of an aversive event or the removal of a positive event. As mentioned, earlier psychologists tend to refer to anything that increases the frequency of a response as a reinforcer. Hence, rewards are referred to as positive reinforcers, and the removal of punishments is referred to as negative reinforcers.
Primary Vs. Secondary ReinforcersThe major factor in determining whether a behavior will be conditioned or not is the nature of the consequences that result from that behavior. If the consequence of a behavior is not one that is recognized by the subject as being a reinforcer, the behavior will not be reinforced. One set of consequences that are clearly reinforcers are those that satisfy some biological need. Food is an obvious example of such a reinforcer. To a hungry person, food will always be reinforcing. Reinforcers that meet a biological need or drive are known as Primary or Unconditioned reinforcers. Primary reinforcers include food, water, and avoidance of pain. There are however many other consequences that people find reinforcing even though they do not satisfy a biological need. For instance, people are not born with any innate drive to earn money, yet through life experience, we learn to treat money as a reinforcer. These other reinforcers are referred to as Secondary or Conditioned reinforcers. Secondary reinforcers are learned through continued pairing with other exisiting reinforcers. For instance, we learn to treat money as a reward because it allows us to obtain other reinforcements, e.g. food. The process by which the range of reinforcers is expanded is known as Classical Conditioning (see sidebar). Some conditioned reinforcers are especially effective as they can be paired with many types of reinforcers. These are called generalized reinforcers. Money, tokens, approval, and affection are generalized reinforcers since they can be associated with a variety of other events that are themselves reinforcing. For instance, money can be exchanged for many other events that are reinforcing, such as snacks, toys, and video games. |
|
In life we are often presented with multiple reinforcement schedules, and our actions at any one time are a result of our making a choice between alternatives. Psychologists have tried to understand how organisms choose between multiple reinforcement schedules and have found a remarkable consistency in how we choose. They have found that organisms choose one reinforcement schedule over another in direct proportion to the frequency, magnitude, or delay in reinforcement of reinforcers for each schedule. For instance, if a pigeon receives one pellet of food for tapping a blue key five times, but two pellets for tapping a red key five times, the pigeon will tap the red key twice as often as the blue. Likewise, if one schedule provides reinforcement twice as often as another, organisms will select that schedule over the other on a 2:1 ratio. The same goes for delays in reinforcement. If one schedule provides a reinforcement after a two-second delay, while another has a four-second delay, the animal will prefer the first over the second by the same 2:1 ratio. This relationship is known as "matching law", which holds that the relative rate of response on an alternative is approximately equal to the relative, rate, magnitude, and immediacy of reinforcement provided for responding to that alternative.
Studies of animal feeding behavior in their natural environment have produced results consistent with matching law. Optimal foraging theory says that feeding behavior is sensitive to the relation between the amount of energy that is expended in finding, securing, and consuming food and the amount of energy or nutrition that the food provides. The net return of energy is determined by the size, quality, scarcity, and work involved in subduing prey. When given a choice between different foods, animals will select in direct proportion to the net return of energy of the various food choices. The foraging behavior of animals as diverse as bees, owls, and rodents have all been accounted for with extreme accuracy through optional foraging theory.
Copyright Sean Butcher, 2004