Black Box Thinking Page 18
Their visit to Rahway was part of a crime-reduction program called “Scared Straight.” The idea was that by giving these youngsters a glimpse of prison life—what it is really like inside a maximum security installation—they would be shocked, or at least nudged, into a change of behavior. The program, which had been conceived by the inmates, had been running for two years.
The kids didn’t buy the premise, of course. Nobody was going to frighten them out of stealing and mugging. They were too tough to be intimidated by anyone, least of all the jailbirds at Rahway. “They don’t scare me,” one of the youngsters said with a shrug of the shoulders. “I think it’s going to be great going in and seeing all them burnouts,” Lori said, laughing.
As they walked through the metal detector at the entrance of the prison, however, the youngsters experienced a first tremor of apprehension. “Line up against the wall!” a sergeant shouted. “You may think this is a sightseeing trip. It isn’t. When you went through the door, the man who brought you lost jurisdiction over you. You’re in our hands. You’ll do as we say. The first thing is to stop smoking! And don’t chew gum! And take off those hats!”
This was not what they were expecting. They were ordered to walk in single file into the main prison area as an iron door slammed behind them. They were now in the bowels of a maximum security prison. Up on the balcony convicted prisoners looked down on them. “There’s a sweet mother****** right there, with the yellow shirt on!” a muscular black convict yelled. “When you are here, you’ll be my bitch,” another said menacingly. The kids looked at the guards for a reaction, but there was no response. Their fear heightened.
They were then walked through a cell block called “the hole,” populated by prisoners in solitary confinement. The sexual jibes at this stage are too shocking to report. The kids became ever more uncertain. The swagger had vanished. You could see the confusion and fear on their faces. But they were not even thirty minutes into their initiation.
For the next two hours, they were locked in a small room with twenty lifers: prisoners who have been given minimum sentences of twenty-five years. Together, their terms added up to nearly a thousand years. This is where the intervention really began. One at a time, the lifers stood up and offered an insight into what the youngsters could expect if they ever came to Rahway.
“Two of you guys I don’t like,” a convict with a life sentence for murder screamed at the kids. “I don’t like you and I don’t like you. You got one time to smile at me and I am going to turn your teeth upside down. You understand? I have just got out of the hole today and I am going to turn your teeth upside down.”
The kids had arrived at Rahway with the vague idea that prison was an easy ride. They thought they could just breeze through. They thought they were tough. As they listened, they were systematically disabused of their naïveté. Another inmate asked:
When we got sexual desires, who do you think we get? Take a wild guess . . . We get young, dumb mother*******, just like you. I am in here ten years and I am going to die in this stinking joint. And if they want to give me these three bitches right here I would leap over them like a kangaroo just to get to one young, pretty . . .
One day you are lying on your blanket, and your mind is drifting over those thirty foot walls and you are thinking about who’s with your girl when three guys will slide into your cell, wrap you up in that blanket, and I don’t care how tough you think you are or how strong you might be, but they are going to kick you onto the side of that bed, and they are going to [rape you].
None of the kids were talking now. One or two were crying. The lifers were not acting out of spite. They were, in effect, issuing warnings, admonishing the kids to change before it was too late. This was an attempt to deter the next generation of criminals. The lifers didn’t want the youngsters to make the same mistakes they had.
“We don’t get paid for doing this,” the kids were told. “We don’t get no extra reward, no extra benefits, no nothing. We do it because we want to do it. Because we might help you.” Another convict said: “I have been here seven years. I regret every day I have been here . . . You have the best opportunity in the world [to avoid prison] . . . You would have to be a fucking fool not to take it.”
The kids were inside Rahway for three hours, but it seemed like three days. They had seen the reality of prison and were adamant they would never go back. Crime no longer seemed cool, but a game that led to hopelessness and desperation. On the way home they were silent. At one point the driver had to stop the car so that one of the boys could vomit.
“I was just so scared, I don’t want to go to one of them things,” Lori, the girl with the big earrings, said. “It scared the shit out of me, I didn’t like it at all.”
“I think it will change my life,” another said, wide-eyed. “I mean I have got to cut some of this [crime] out. All of it, if possible . . . I am going to try very hard.” Others talked about going to college: anything to avoid jail.
The prison visit was recorded by Arnold Shapiro, a documentary maker. His film of the visit was later broadcast by KTLA, Channel 5 in Los Angeles and fronted by Peter Falk of Columbo fame. Viewers were riveted by the grim reality of prison life and by the seemingly incredible results of the Scared Straight program. Falk revealed that of the seventeen youngsters, sixteen were still going straight three months later. He also reported that the wider program had had a dramatic impact on reoffending rates. Falk said:
Over 8,000 juvenile delinquents have sat in fear on these hard wooden benches and for the first time they really heard the brutal reality of crime and prison. The results of this unique program are astounding. Participating communities report that 80 to 90 percent of the kids that they send to Rahway go straight after leaving this stage. That is an amazing success story. And it is unequalled by traditional rehabilitation methods.
Politicians lined up to praise the program. Newspaper columns were penned. Social commentators praised the approach of Scared Straight. Feckless kids were pushed into line and brought face-to-face with the consequences of their actions. It was the kind of short, sharp shock treatment that pundits had been crying out for. It was razor-edged deterrence.3
During the week of March 5, 1979, Shapiro’s documentary was shown in two hundred major cities.4 The following month it won the Oscar for best documentary feature at the Academy Awards. The Scared Straight program was rolled out across the United States, Canada, the UK, Australia, and Norway. Its effectiveness was attested to by judges, correction officers, and other experts.
The data seemed remarkable. As George Nicola, a juvenile judge who worked in New Brunswick, a few miles from Rahway, put it: “When you view the program and review the statistics that have been collected, there is no doubt in my mind . . . that the juvenile awareness project at Rahway State prison is perhaps today the most effective, inexpensive deterrent in the entire correctional process in America.”5
But there turned out to be one rather large problem with Scared Straight. It didn’t work. Rigorous testing would later prove that the kids who were taken on prison visits were more likely to commit offenses in the future, not less—as we shall see. A more appropriate name for Scared Straight might have been Scared Crooked. It was an unequivocal failure. It damaged kids in a number of ways.
But first we will ask: How is this possible? How can something be a failure when the statistics seem to show that it is a success? How can it be failing when virtually every expert is lining up to endorse it? To answer that question we will examine one of the most important scientific innovations of the last two hundred years, and one that takes us to the heart of the closed-loop phenomenon—and how to overcome it.
The randomized control trial.
II
Closed loops are often perpetuated by people covering up mistakes. They are also kept in place when people spin their mistakes, rather than confronting them head on. But there is a third way that clo
sed loops are sustained over time: through skewed interpretation.
That was the problem that bedeviled bloodletting, practiced by medieval doctors. The doctors had what seemed like clear feedback on what worked and what didn’t. Either the patient died in the aftermath of the procedure or did not. The evidence was there for all to see.
But how to interpret this evidence? As we’ve seen, doctors, already convinced of the wisdom of figures like Galen, trusted in the power of bloodletting. When a patient died, it was because they were so ill that not even bloodletting could save them. But when they lived, that confirmed the brilliance of the procedure.
Think of how many success stories must have been circulating around the medieval world: people who had been terribly ill, close to death perhaps, but bloodletting had been performed, and they had recovered. How persuasive their testimony would have sounded. “I was on the brink of mortality, a doctor drained me of some blood, and now I am cured!”
Consider how they would have commended the procedure in market squares. Those who died on the other hand? Well, they would not be around to say anything, would they? Their testimony had vanished.
Now look at the following diagram.6
In this (hypothetical) example, a group of chronically ill people are subjected to bloodletting. Some of them recover. This is the “evidence” that justifies the treatment. People get better and they are understandably happy about it.
Bloodletting without a control group.
However, what the doctors don’t see, and the patients don’t see, is what would have happened if the treatment had not been given. In experiments this is commonly known as the “counterfactual.” It is all the things that could have happened but which in everyday experience we never observe because we did something else.
We don’t observe what would have happened if we had not gotten married. Or see what would have happened if we had taken a different job. We can speculate on what would have happened, and we can make decent guesses. But we don’t really know. This may seem like a trivial point, but the implications are profound.
Now look at another diagram, below. Here the patients have been randomly divided into two groups. Some of them get access to bloodletting while the others (called the control group) do not. This is known as a randomized control trial (RTC); in medicine it is called a clinical trial. We see from the diagram that many of the patients who receive bloodletting recover. It looks successful. The feedback is impressive.
But now look at the group who did not get the treatment. Many more have recovered than in the treated group. The reason is simple: the body has its own powers of recuperation. People recover naturally even without treatment. In fact, by comparing the two groups, it is possible to see that, far from saving people as medieval doctors sincerely believed, bloodletting, on average, kills them. This fact would have been invisible without the control group.* And this is why, as we noted in chapter 1, bloodletting survived as a recognized treatment until the nineteenth century.
Bloodletting with a control group.
So far in this book we have examined cases of unambiguous error. When a plane crashes you know the procedures were defective. When DNA evidence shows that an innocent man is convicted, you know the trial or investigation was flawed. When a minimum viable product is rejected by early adopters, you can be sure the final product will bomb. When a nozzle is clogging up, you know it will cost you money. These examples gave us a chance to examine failure in the raw.
Much real-world failure is not like this. Often, failure is clouded in ambiguity. What looks like success may really be failure and vice versa. And this, in turn, represents a serious obstacle to progress. After all, how can you learn from failure if you are not sure you have actually failed? Or, to put it in the language of the last chapter, how can you drive evolution without a clear selection mechanism?
To take a concrete example, suppose you redesign your company website and that sales subsequently increase. That might lead you to believe that the redesign of the website caused the boost in sales. After all, one preceded the other. But how can you be sure? Perhaps sales went up not because of the new website, but because a rival went bust, or interest rates went down, or because it was a rainy month and more people shopped online. Indeed, it is entirely possible that sales would have gone up even more if you had not changed the website.
Looking at the sales statistics is not going to help you find an answer any more than looking at the number of people recovering from bloodletting will help you find out if the treatment is effective. The reason is simple: you can’t observe the counterfactual. You don’t know whether the change in sales was caused by something else; something, perhaps, you hadn’t even considered.
RCTs solve this problem. In effect they provide a high-definition test. They turn shades of gray into something closer to black and white. By isolating the relationship between an intervention (bloodletting, a new website, etc.) and an outcome (recovery from illness, sales) without it being obscured by other influences, they clarify the feedback. Without such a test you could draw the wrong conclusions, not just once but potentially indefinitely.
RCTs have revolutionized pharmacology. Ben Goldacre, a doctor and writer who is an evangelist for evidence-based medicine, has said: “This one idea has probably saved more lives, on a more spectacular scale, than any other idea you will come across this year.”7 Mark Henderson, a former science editor of The Times, said: “The Randomised Control Trial is one of the greatest inventions of modern science.”8
It is probably worth emphasizing that RCTs are not a panacea. There are situations where they are difficult to use and where they might be considered unethical. And trials have often been rigged in subtle ways by pharmaceutical companies eager to come up with an answer that they have already prejudged.9 But these are not arguments against randomized trials, merely against how they have been corrupted by people with dubious motives.
Another objection is that randomized trials neglect the holistic nature of a system. In medicine, for example, while a drug may cure a particular symptom, it may also have negative long-term effects on the rest of the body, or leave the underlying cause untreated. For example, prescribing a pill to combat a stomach complaint might cause damage to the immune system that could, in the long run, leave the patient worse off.
What this objection is saying, in effect, is that the measurement period for a clinical trial shouldn’t be the immediate aftermath of administering a drug, but the entire life of the patient, and that the outcome shouldn’t merely focus on a particular symptom, but the whole person. This shows that it is vital to keep an eye on the long-term consequences when conducting RCTs, something that has sometimes been overlooked in medicine.
But it is also worth noting that such considerations carry little weight when it comes to life-threatening conditions. If you find yourself in the middle of an epidemic of, for example, smallpox or Ebola, you will want the vaccine even if there is a risk of complications in a few decades’ time.10
With these caveats in mind, then, RCTs offer a powerful method of establishing rigorous tests in a complex world. Handled with care, they cut through the ambiguity that can play havoc with our interpretation of feedback. And they are often simple to conduct.
Take the example of the redesigned website mentioned earlier. The problem was in establishing whether the change in the design had increased sales, or was caused by something else. But suppose you randomly direct users to either the new or the old design. You could then measure whether they buy more goods from the former or the latter. This would filter out all the other influences such as interest rates, competition, weather and so on, and reveal the hidden counterfactual.
There have been around half a million RCTs in medicine since the 1950s. They have saved hundreds of thousands of lives. But the remarkable thing is that in many areas of human life RCTs have hardly been used at all. In the
criminal justice system they are almost nonexistent. In 2006, for example, there were almost 25,000 trials in medicine, but in crime and justice across the world there were only 85 between 1982 and 2004.11
David Halpern, one of the most respected policy analysts in the UK, has said: “Many areas of government have not been tested in any form whatsoever. They are based on hunch, gut feel and narrative. The same is true of many areas outside government. We are effectively flying blind, without much of a clue as to what really works, and what doesn’t. It is actually quite scary.”12
Closed loops are not merely an intellectual curiosity, they realistically describe the world we live in. They are small and large, subtle and intricate; they lurk in small companies, big companies, charities, corporations and governments. The majority of our assumptions have never been subject to robust failure tests. Unless we do something about it they never will be.
To glimpse the often mind-bending gulf between what we think we know and what we really know, let us revisit the Scared Straight program. It looked astonishingly effective. The observational statistics seemed compelling.* But we now know that the program was increasing crime rather than reducing it.
In many ways, Scared Straight stands as a metaphor not merely for government policy (perhaps the closest thing in the twentieth-first century to bloodletting), but for the wider world. This program could have continued on its merry way for decades, perhaps centuries, without a proper test.
Scared Straight is a metaphor, but above all, it is a warning.
III
In 1999, Scared Straight! 20 Years Later was broadcast in the United States. The documentary was fronted this time by Danny Glover rather than Peter Falk, and revisited those seventeen, scrawny teenagers who had appeared in the original film. The results were as seemingly miraculous as the original program had led audiences to believe.