Machines can now beat humans at complex tasks that seem tailored to the strengths of the human mind, including poker, the game of Go, and visual recognition. Yet for many high-stakes decisions that are natural candidates for automated reasoning, like doctors diagnosing patients and judges setting bail, experts often favor experience and intuition over data and statistics. This reluctance to adopt formal statistical methods makes sense: Machine learning systems are difficult to design, apply, and understand. But eschewing advances in artificial intelligence can be costly.
Recognizing the real-world constraints that managers and engineers face, we developed a simple three-step procedure for creating rubrics that improve yes-or-no decisions. These rubrics can help judges decide whom to detain, tax auditors whom to scrutinize, and hiring managers whom to interview. Our approach offers practitioners the performance of state-of-the-art machine learning while stripping away needless complexity.
To see these rules in action, consider pretrial release decisions. When defendants first appear in court, judges must assess their likelihood of skipping subsequent court dates. Those deemed low-risk are released back into the community, while high-risk defendants are detained in jail; these decisions are thus consequential both for defendants and for the general public. To aid judges in making these decisions, we used our procedure to create the simple risk chart below. Each defendant’s flight risk is computed by summing scores corresponding to their age and number of court dates missed. A risk threshold is then applied to convert the score to a binary release-or-detain recommendation. For example, with a risk threshold of 10, a 35-year-old defendant who has missed one court date would score an eight (two for age plus six for missing one prior court date), and would be recommended for release.
Despite its simplicity, this rule significantly outperforms expert human decision makers. We analyzed over 100,000 judicial pretrial release decisions in one of the largest cities in the country. Following our rule would allow judges in this jurisdiction to detain half as many defendants without appreciably increasing the number who fail to appear at court. How is that possible? Unaided judicial decisions are only weakly related to a defendant’s objective level of flight risk. Further, judges apply idiosyncratic standards, with some releasing 90% of defendants and others releasing only 50%. As a result, many high-risk defendants are released and many low-risk defendants are detained. Following our rubric would ensure defendants are treated equally, with only the highest-risk defendants detained, simultaneously improving the efficiency and equity of decisions.
Decision rules of this sort are fast, in that decisions can be made quickly, without a computer; frugal, in that they require only limited information to reach a decision; and clear, in that they expose the grounds on which decisions are made. Rules satisfying these criteria have many benefits, both in the judicial context and beyond. For instance, easily memorized rules are likely to be adopted and used consistently. In medicine, frugal rules may reduce tests required, which can save time, money, and, in the case of triage situations, lives. And the clarity of simple rules engenders trust by revealing how decisions are made and indicating where they can be improved. Clarity can even become a legal requirement when society demands fairness and transparency.
Simple rules certainly have their advantages, but one might reasonably wonder whether favoring simplicity means sacrificing performance. In many cases the answer, surprisingly, is no. We compared our simple rules to complex machine learning algorithms. In the case of judicial decisions, the risk chart above performed nearly identically to the best statistical risk assessment techniques. Replicating our analysis in 22 varied domains, we found that this phenomenon holds: Simple, transparent decision rules often perform on par with complex, opaque machine learning methods.
To create these simple rules, we used a three-step strategy, detailed here, that we call select-regress-round. Here’s how it works.
Select a few leading indicators of the outcome in question — for example, using a defendant’s age and number of court dates missed to assess flight risk. We find that having two to five indicators works well. The two factors we used for pretrial decisions are well-known indicators of flight risk; without such domain knowledge, one can create the list of factors using standard statistical methods (e.g., stepwise feature selection).
Using historical data, regress the outcome (skipping court) on the selected predictors (age and number of court dates missed). This step can be carried out in one line of code with modern statistical software.
The output of the above step is a model that assigns complicated numerical weights to each factor. Such weights are overly precise for many decision-making applications, and so we round the weights to produce integer scores.
Our select-regress-round strategy yields decision rules that are simple. Equally important, the method for constructing the rules is itself simple. The three-step recipe can be followed by an analyst with limited training in statistics, using freely available software.
Statistical decision rules work best when objectives are clearly defined and when data is available on past outcomes and their leading indicators. When these criteria are satisfied, statistically informed decisions often outperform the experience and intuition of experts. Simple rules, and our simple strategy for creating them, bring the power of machine learning to the masses.
Jongbin Jung is a PhD candidate at Stanford University in the Department of Management Science & Engineering.
Connor Concannon is Deputy Director of Analytics at the New York County District Attorney’s Office and a PhD student at John Jay College of Criminal Justice.
Ravi Shroff is a Research Scientist in the Center for Urban Science and Progress at New York University.
Sharad Goel is an Assistant Professor at Stanford University in the Department of Management Science & Engineering.
Daniel G. Goldstein is a Principal Researcher at Microsoft Research.