Bloomberg profile of Richard Berk

Richard Berk is one of the founding fathers of automated risk assessment, and systems based on his work are being deployed in Pennsylvania and other locations. This Bloomberg profile of him has many interesting (and terrifying) nuggets. As always, you should read the whole thing (if Bloomberg’s horrible page rendering doesn’t trigger a headache), but here are some highlights.

What’s interesting in the system he designed is how it’s optimized for cost of incarceration, rather than for accuracy. In the particular case described in the article, this actually makes the system less harsh, because a finding of a problem triggers expensive therapy. On the other side though, there’s a political component: it’s far riskier to release someone who might commit a crime than it is to keep incarcerated someone who might be reformed. As Berk puts it:

The policy position that is taken is that it’s much more dangerous to release Darth Vader than it is to incarcerate Luke Skywalker

The problem of course is that incarcerating Luke Skywalker could turn him into a new Darth Vader, and I don’t know if this is factored into the analysis.

He also says later

Berk argues that eliminating sensitive factors weakens the predictive power of the algorithms. “If you want me to do a totally race-neutral forecast, you’ve got to tell me what variables you’re going to allow me to use, and nobody can, because everything is confounded with race and gender,” he said.

This seems a little binary to me. It’s not an either-or where you either have to keep all sensitive attributes or throw them all out. There are ways to quantify and even subtract out the influence of certain problematic attributes without having to throw out all the information: in fact, we have a paper on this!

As the article, Berk is heading to Norway:

Berk wants to predict at the moment of birth whether people will commit a crime by their 18th birthday, based on factors such as environment and the history of a new child’s parents. This would be almost impossible in the U.S., given that much of a person’s biographical information is spread out across many agencies and subject to many restrictions. He’s not sure if it’s possible in Norway, either, and he acknowledges he also hasn’t completely thought through how best to use such information.

The idea that data can be collected to make such predictions is certainly alluring and tempting. But everything we’re beginning to understand about predictions based on algorithms suggests that making such predictions in the absence of any understanding of the model behavior and why it’s making its decisions is a recipe for disaster.

I’ll note that the recidivism predictions typically work 6 months to 2 years out, and are not particularly accurate! Trying to predict 18 years out is rather scary.

Wisconsin Supreme Court decision on COMPAS

We finally have the first legal ruling on algorithmic decision making. This case comes from Wisconsin, where Eric Loomis challenged the use of COMPAS for sentencing him.

While the Supreme Court denied the appeal, it made a number of interesting observations and recommendations:

  • “risk scores may not be considered as the determinative factor in deciding whether the offender can be supervised safely and effectively in the community.”
  • “the following warning must be given to sentencing judges: “(1) the proprietary nature of COMPAS has been invoked to prevent disclosure of information relating to how factors are weighed or how risk scores are to be determined; (2) risk assessment compares defendants to a national sample, but no cross- validation study for a Wisconsin population has yet been completed; (3) some studies of COMPAS risk assessment scores have raised questions about whether they disproportionately classify minority offenders as having a higher risk of recidivism; and (4) risk assessment tools must be constantly monitored and re-normed for accuracy due to changing populations and subpopulations.”

Like Danielle Citron (the author of the Forbes article) I’m a little skeptical that this will be enough. Warning labels on cigarette boxes didn’t really stop people smoking. But I think as part of a larger effort to increase awareness of the risks, and to make people even stop and think a little before blindly forging ahead with algorithms, this is a decent first step.

At the AINow Symposium in New York (that I’ll say more about later), one proposed extreme along the policy spectrum regarding algorithic decision-making was to place a moratorium on the use of algorithms entirely. I don’t know if that makes complete sense. But a heavy heavy dose of caution is definitely warranted, and rulings like this might lead to a patchwork of caveats and speedbumps that help us flesh out exactly where algorithmic decision making makes more or less sense.

 

Testing algorithmic decision-making in court.

Well that was quick!

On the heels of the ProPublica article about bias in algorithmic decision-making in the criminal justice system, a lawsuit now before the Wisconsin Supreme Court could mark the first legal determination about the use of algorithmic methods in sentencing.

The first few paragraphs of the article summarize the issue at hand:

When Eric L. Loomis was sentenced for eluding the police in La Crosse, Wis., the judge told him he presented a “high risk” to the community and handed down a six-year prison term.

The judge said he had arrived at his sentencing decision in part because of Mr. Loomis’s rating on the Compas assessment, a secret algorithm used in the Wisconsin justice system to calculate the likelihood that someone will commit another crime.

Mr. Loomis has challenged the judge’s reliance on the Compas score, and the Wisconsin Supreme Court, which heard arguments on his appeal in April, could rule in the coming days or weeks. Mr. Loomis’s appeal centers on the criteria used by the Compas algorithm, which is proprietary and as a result is protected, and on the differences in its application for men and women.

Racist risk assessments, algorithmic fairness, and the issue of harm

By now, you are likely to have heard of the fascinating report (and white paper) released by ProPublica describing the way that risk assessment algorithms in the criminal justice system appear to affect different races differently, and are not particularly accurate in their predictions. Even worse, they are even worse at predicting outcomes for black subjects than for white. Notice that this is a separate problem than ensuring equal outcomes pace disparate impact: it’s the problem of ensuring equal failure modes as well.

Screenshot_2016-05-24-08-53-55~2

There is much to pick apart in this article, and you should read the whole thing yourself. But from the perspective of research in algorithmic fairness, and how this research is discussed in the media, there’s another very important consequence of this work.

It provides concrete examples of people who have possibly been harmed by algorithmic decision-making. 

We talk to reporters frequently about the larger set of questions surrounding algorithmic accountability and eventually they always ask some version of:

Can you point to anyone who’s actually been harmed by algorithms?

and we’ve never been able to point to specific instances so far. But now, after this article, we can.

 

White House Report on Algorithmic Fairness

The White House has put out a report on big data and algorithmic fairness (announcement, full report).  From the announcement:

Using case studies on credit lending, employment, higher education, and criminal justice, the report we are releasing today illustrates how big data techniques can be used to detect bias and prevent discrimination. It also demonstrates the risks involved, particularly how technologies can deliberately or inadvertently perpetuate, exacerbate, or mask discrimination.

The table of contents for the report gives a good overview of the issues addressed:

Big Data and Access to Credit
The Problem: Many Americans lack access to affordable credit due to thin or non-existent credit files.
The Big Data Opportunity: Use of big data in lending can increase access to credit for the financially underserved.
The Big Data Challenge: Expanding access to affordable credit while preserving consumer rights that protect against discrimination in credit eligibility decisions

Big Data and Employment
The Problem: Traditional hiring practices may unnecessarily filter out applicants whose skills match the job opening.
The Big Data Opportunity: Big data can be used to uncover or possibly reduce employment discrimination.
The Big Data Challenge: Promoting fairness, ethics, and mechanisms for mitigating discrimination in employment opportunity.

Big Data and Higher Education
The Problem: Students often face challenges accessing higher education, finding information to help choose the right college, and staying enrolled.
The Big Data Opportunity: Using big data can increase educational opportunities for the students who most need them.
The Big Data Challenge: Administrators must be careful to address the possibility of discrimination in higher education admissions decisions.

Big Data and Criminal Justice
The Problem: In a rapidly evolving world, law enforcement officials are looking for smart ways to use new technologies to increase community safety and trust.
The Big Data Opportunity: Data and algorithms can potentially help law enforcement become more transparent, effective, and efficient.
The Big Data Challenge: The law enforcement community can use new technologies to enhance trust and public safety in the community, especially through measures that promote transparency and accountability and mitigate risks of disparities in treatment and outcomes based on individual characteristics.

Predictive policing in action

Predictive policing is the idea that by using historical data on crime, one might be able to predict where crime might happen next, and intervene accordingly. Data And Society has put together a good primer on this from the 2015 Conference on Data and Civil Rights that they organized last year (which I attended: see this discussion summary).

If you’re not in the know about predictive policing, you’d be shocked to hear that police jurisdictions all around the country are already using predictive policing software to manage their daily beats. PredPol, one of the companies that provides software for this, says (see the video below) that their software is used in 60 or so jurisdictions.

Alexis Madrigal from Fusion put together a short video explaining the actual process of using predictive policing. It’s a well-done video that in a short time explores many of the nuances and challenges of this complex issue. Some thoughts I had after watching the video:

  • Twice in the episode (once by the CEO of Predpol and once by a police officer) we hear the claim “We take demographics out of the decisionmaking”. But how? I have yet to see any clear explanation of how bias is eliminated from the model used to build predictions, and as we know, this is not an easy task. In fact, the Human Rights Data Analysis Group has done some new research illustrating how Predpol can AMPLIFY biases, rather than removing them.

     

  • At some point, the video shows what looks like an expression of a gradient and says that Predpol constructs an “equation” that predicts where crime will happen. I might be splitting hairs, but I’m almost certain that Predpol constructs an algorithm, and as we already know, an algorithm has nowhere near the sense of certainty, determinism and precision that an equation might have. So this is a little lazy: why not just show a picture of scrolling code instead if you want some visual.
  • The problems we’ve been hearing about with policing over the past few years have in part been due to over-aggressive responses to perceived behavior. If an algorithm is telling you that there’s a higher risk of crime in an area, could that exacerbate this problem?
  • Another point that HRDAG  emphasizes in their work is the difference between crime and the reporting of crime. If you put more police in some areas, you’ll see more crime being reported in that area. It doesn’t mean that there’s actually more crime committed in that area.

Living in a bad neighborhood takes on a whole new meaning

The neighborhood you live in might control which schools you have access to and what a house costs. But it might also control what kind of sentencing guidelines are in effect if you have the misfortune of being sent to jail.

Pennsylvania will soon begin using factors other than a convict’s criminal history — such as the person’s age, gender, or county of residence — in its sentencing guidelines…

This is part of a long-running discussion on data-driven approaches to recidivism and criminal justice in general. The (honorable) rationale behind these efforts is to eliminate explicit bias/prejudice in sentencing, which (as the article points out) was governed by more lax guidelines than for investigating crimes in the first place.

And as always, I don’t think there’s a problem in using data-driven methods to inform and guide the underlying mechanisms that might cause recidivism. But it gets a lot trickier when these tools are used to guide sentencing. In the case of Pennsylvania, the tool will be used to “assist” judges in sentencing decisions. And as Goodhart’s law suggests, once you attempt to quantify something as nebulous as “likely to re-offend”, the quantification takes on a life of its own.

But even this isn’t the biggest problem. There’s now a long history of models for predicting recidivism, and one of the most studied ones is the LSI-R (and its generalization, the LSRNR). But these models are proprietary: you need to purchase the software to see what they do.

And if you think that’s not a problem, I’d like to point you to this article by Rebecca Weller in Slate. In particular

Today, closed, proprietary software can put you in prison or even on death row. And in most U.S. jurisdictions you still wouldn’t have the right to inspect it.

And as she goes on to point out,

Inspecting the software isn’t just good for defendants, though—disclosing code to defense experts helped the New Jersey Supreme Court confirm the scientific reliability of a breathalyzer.

At the heart of these arguments is a point that has nothing to do with algorithms.

Wanting transparency does not imply lack of trust. It reflects a concern about a potential violation of trust.