©2002 RDA

 

Four Ways of Training

 

Richard D. Alexander and Cynthia Kagarise Sherman

 

This essay was stimulated by our reading and discussing a fine article by Pat Miller in The Whole Dog Journal, Volume 5, Number 3, titled "Just Rewards." In it she has a box headed "The Four Principles of Operant Conditioning." CKS saw the article and brought it to RDA's attention.

 RDA comment: Cindy's Ph.D. thesis was a study of the ecology and behavior of an unusual toad species that lives in the Sierra Nevada Mountains. More recently she became interested in dog training, especially agility competitions. I think of her as one of the people, like Pat Miller, who are joining with others to "educate trainers and move the dog training industry toward a more scientifically-based, positive profession" (this phrase from Miller's article). Cindy has contributed greatly to my understanding of the relationship between the "on-the-ground" work of dolphin and dog trainers, with clickers and targets and sacks of food on their belts, and the "from-the-back" training of the horseback rider. We have co-authored a related essay on this web site titled Targeting, Clicker-Bridging, and Positive and Negative Aspects of Horse Training.

Incidentally, in the same issue of The Whole Dog Journal, it is mentioned that Pat Miller recently published her first book, The Power of Positive Dog Training, which can be ordered by calling (423) 326-0444, by going online at www.peaceablepaws.com, or from Border's Book Store. We haven't seen the book yet, but if Miller summarizes things as clearly as she did in this article, anything she writes about training is worth reading.

What we have written below is a take-off from Miller's article. Our specific purpose was to see if we can translate the four principles she discusses into simpler language, and then explain why we think such a translation is important. We have both been animal behaviorists for a good while, and each of us has found the terms used for different forms of conditioning confusing. We wanted to see if we could convince ourselves that the confusion is unnecessary.

We begin by defining some terms. They're familiar to everyone, but we want to make sure you know what we mean when we use them.

BEHAVIOR can be desirable or undesirable (= good or bad), or neutral. Behavior neutral to human interests, however, will generally be treated by trainers as undesirable -- i.e., to be replaced by desirable behavior.

STIMULI can be rewarding or punishing or neutral. Neutral stimuli can be ignored, except that trainers need to understand which of the stimuli they impose on an animal are rewarding, punishing, or neutral.

Rewards cause behaviors to be repeated, increased in frequency, or enhanced; punishments cause behaviors to be suppressed, erased, or avoided. Animals change their behavior in the direction of seeking less punishing or aversive stimuli and more rewarding stimuli.

Miller uses the term "punishment" as it is used in RDA's book, Teaching Yourself to Train Your Horse (TYTYH) ­ so as to include all stimuli that are neither rewarding nor neutral. Punishment is such a simple and well understood term that it seems too bad training people don't use it this way more extensively. Many people seem to accept definitions implying that punishment refers only to legal and moral responses to transgressions by humans. But everyone hears and reads that even physical things like wind and rain and sleet, and also non human beings like competitors and predators, can have "punishing" effects. So we use punishment to refer to all stimuli with negative, aversive, or unpleasant effects.

Here's our way of saying how to apply stimuli to change an animal's behavior in the usually desired directions, whether the behaviors or the stimuli are introduced by the trainer (as cues or deterrents) or incidental to the trainer's behavior (or "spontaneous," meaning not owing to anything the trainer did: see pp. 42-44, TYTYH):

A. Trainer responses to desirable behavior:

1. add rewards

2. remove punishments

(or do both)

B. Trainer responses to undesirable behavior:

1. remove rewards

2. add punishments

(or do both)

These four courses of action, if continued in appropriate training situations, will reinforce desired behaviors and reduce undesirable behaviors. In other words, with repetition, they will result in the animal gradually becoming conditioned to carry out the behaviors the trainer desires.

Every professional behaviorist, no matter what the approach, wishes to create the simplest possible jargon that is also complete and unequivocal. Some people are more concerned than others to make their professional jargon consistent, and some are also determined to follow the usages of those who developed the principles and invented the terms, even at the expense of failing to make things maximally simple and clear to trainers who might not enjoy digesting complicated terminology. Thus, purists ­ or just conscientious people like Pat Miller, who want to "get it straight" -- may refer to the four practices as positive reinforcement (A1), negative reinforcement (A2), negative punishment (B1), and positive punishment (B2). These terms may be accurate and have priority in usage over any alternatives to them. But they are also confusing, and that may be one reason they are not in wide usage, at least among horse trainers. Thus, positive and negative are usually applied to stimuli, and designate rewarding and punishing stimuli, respectively. As a result, the unwary reader may, with good cause, believe wrongly that positive reinforcement means that a behavior is reinforced positively, or rewarded, therefore caused to increase in frequency, while negative means that a behavior is reinforced negatively, therefore caused to diminish in frequency. According to the above jargon, however, positive and negative refer to whether a stimulus is added or subtracted in a training situation, not whether the stimulus is rewarding or punishing. Aware of this confusion some trainers have added the novel term "withdrawal reinforcement" for "negative reinforcement." Perhaps curiously, no one seems yet to have made the logical extension to "addition reinforcement" for "positive reinforcement" and "positive punishment."

Even if addition and withdrawal were the adjectives applied to reinforcement, there would still be the confusion that what is being reinforced is sometimes a desirable behavior replacing some particular less desirable one (whether or not the more desirable behavior is elicited deliberately by the trainer), and sometimes the disappearance of a behavior, it necessarily being replaced by a different (and usually unspecified) less undesirable behavior, simply because animals never are "doing nothing."

We wonder if the whole thing can be discussed more simply and less confusingly, by referring merely to whether rewards or punishments are added or subtracted to move from less desirable behaviors to more desirable behaviors. This is what we have tried above with A1-2 and B1-2.

It will surely clarify things to provide examples of each of the above four situations from horse training. We'll do this mainly by referencing a few instances of each from RDA's book (TYTYH), and, we hope, sufficiently characterizing at least one for each situation. Once the reader understands the four principles, it will be seen that TYTYH is filled with examples, sometimes given in long sequences involving all of them ­ for example, lead rope training, beginning on p. 106, and starting a two-year old in riding, beginning on p. 190.

A1. Adding rewards to encourage (reinforce) desirable behaviors

TYTYH examples: p. 43, left column, middle paragraph: rewarding (here, scratching) a horse for approaching the trainer spontaneously; p. 44, left column, middle paragraph: using soft cues even on the first rides; pp. 84-85: rewarding a horse for accepting the trainer's approach toward it; item #7, p. 92: using the nose band of the halter to rub the horse's nose gently after it has accepted the halter, or as its last experience as it allows itself to be unhaltered quietly. For another example, see B1 below.

This is probably the easiest of the four procedures to understand and use. But the reward must be timed precisely to be maximally effective: it should be given precisely when the desirable behavior occurs. Many opportunities to use this method to good advantage can be missed, unless the trainer works to detect opportunities. Continual attention to this detail until it becomes a training habit can greatly reduce the amount of time and effort necessary to train a horse.

A2. Removing punishments (often imposed deliberately) to elicit or encourage desirable behaviors that appear as responses to them.

TYTYH examples: p. 44, left column: slacking tension on a lead rope when the horse makes a move to follow, removing rein and leg pressure when the horse moves away from the pressure; p. 45-46: teaching direct and indirect (neck) reining; p 47, right column, last paragraph: "leaving a horse alone" as reward -- meaning removal of any imposed punishment, no matter how mild.

RDA example: In reference to "leaving the horse alone," nothing makes me feel better than being able to ride along on a young (or old) horse without using the faintest pressure with reins, legs, or anything else because the horse is doing exactly what I want. I try to create and maximize this situation, and increase its duration, because more than anything else it seems to cause the horse to enjoy what we are doing together. The fewer signals I have to give, and the lighter and more accurate they are, the happier we will both be. I never have forgotten hearing a horse trainer say something like this: Niggle your horse [meaning, just keep on giving it those unnecessary little pesky, irritating signals] and you'll end up with a niggling horse."

Creating pressure and then removing it (A2) is the most common kind of training procedure used by horse people, because most horse training occurs while the trainer is mounted on the horse. Stimuli that start out as punishing (or aversive or unpleasant) are terminated (this constituting a reward) precisely when the horse responds. We discuss this proposition in some detail in the essay on this web site on Targeting, Clicker-Bridging, and Positive and Negative Aspects of Horse Training, especially how it changes as the training of a horse proceeds.

B1. Removing rewards to discourage undesirable behaviors

TYTYH examples: p. 97, lower left column on to right column: discouraging threatening behavior; p. 151, fig. 146, shows a stallion that has learned to move into a position that places the trainer in the safest place near him because the trainer not only scratches and grooms him whenever this position is assumed (A1 above), but also ceases grooming immediately when he moves out of this position (B1).

RDA example: This kind of conditioning is probably the least well understood, or least used, by horse trainers. A good example involves a young stallion which became extremely excited when I approached his pasture gate to feed him grain. He would rush at the gate, sometimes rearing and whirling, reach across the gate and make motions as if he were going to nip me. I responded to such undesirable behavior by stopping all motion, making sure I was just out of reach of his mouth. The very moment his head was withdrawn, so as not to extend over the gate (he usually turned it sideways without backing), I would instantly start again to open the gate, still carrying the food. If he thrust his head over the gate again, I backed away (if necessary ­ usually only a step) and stood motionless again. He learned in two or three repetitions to turn his head sideways and wait. Similarly, once inside the gate and next to him, if he came too close I would stop moving toward his feeder and face him saying "Back!" until he backed away from me. When he backed I headed for the feeder again. At the feeder I required him to back before I deposited the feed, using the same cessation of motion toward actually feeding him as a removal of reward. By "requiring" him to back I mean that I used the fact that he had been taught to back on vocal command alone, and when I (more forcefully) walked toward him saying "Back!" Sometimes I raised my arm high, with the hand facing him, when I said "Back." I made no motions toward feeding him until he backed. At first I required only the smallest evidence of a tendency to step backward, such as moving only one foot backward. From such a beginning a horse can be taught to back any number of steps before receiving feed. Patience is usually necessary, and you have to keep it up. But if you do this well, your horse will become more polite with you in all circumstances.

B2. Adding punishments to discourage undesirable behaviors

TYTYH examples: p. 40-41: discouraging nipping; p. 48, 128: power leading; pp. 54-55: general; pp. 49-51, 56, 70-77, 82-82, 238-243, 244-246, and elsewhere describe how and why to minimize this form of training.

People who train animals entirely from the ground, as with dolphin and dog trainers, seem to find it easy to train primarily by adding rewards (A1) or removing rewards (B1). Such trainers rarely need to add punishments. Horse trainers, who typically accomplish most training from the horse's back, most often impose an appropriate punishment, keeping it as mild as possible, then reward by removing it when the horse responds (A2). Chapter 2 of TYTYH, which discusses philosophy of training, explains how the A2 procedure can be used in the most positive way, and so does the RDA-CKS essay at this web site on Targeting, Clicker-Bridging, and Positive and Negative Aspects of Horse Training.

It seems likely that a training system which is highly dependent on providing an aversive or punishing stimulus and then removing it (A2) has a certain likelihood of fostering unnecessary severity, partly because the stimulus almost has to be most severe when it is first employed, and is increasingly softened by the good trainer as the animal learns to respond. It may also lead trainers to unnecessary addition of punishments to discourage undesirable behaviors (B2). Because B2 training seems to us the least effective, and the most heavily-laden with counter-productive side effects, everyone who has to use A2 training extensively (as do horse trainers) might profit from making a special effort to avoid having their training procedures spill over unnecessarily into the punishment of B2 training.

<< Back To List