Predictive policing in action

Predictive policing is the idea that by using historical data on crime, one might be able to predict where crime might happen next, and intervene accordingly. Data And Society has put together a good primer on this from the 2015 Conference on Data and Civil Rights that they organized last year (which I attended: see this discussion summary).

If you’re not in the know about predictive policing, you’d be shocked to hear that police jurisdictions all around the country are already using predictive policing software to manage their daily beats. PredPol, one of the companies that provides software for this, says (see the video below) that their software is used in 60 or so jurisdictions.

Alexis Madrigal from Fusion put together a short video explaining the actual process of using predictive policing. It’s a well-done video that in a short time explores many of the nuances and challenges of this complex issue. Some thoughts I had after watching the video:

  • Twice in the episode (once by the CEO of Predpol and once by a police officer) we hear the claim “We take demographics out of the decisionmaking”. But how? I have yet to see any clear explanation of how bias is eliminated from the model used to build predictions, and as we know, this is not an easy task. In fact, the Human Rights Data Analysis Group has done some new research illustrating how Predpol can AMPLIFY biases, rather than removing them.


  • At some point, the video shows what looks like an expression of a gradient and says that Predpol constructs an “equation” that predicts where crime will happen. I might be splitting hairs, but I’m almost certain that Predpol constructs an algorithm, and as we already know, an algorithm has nowhere near the sense of certainty, determinism and precision that an equation might have. So this is a little lazy: why not just show a picture of scrolling code instead if you want some visual.
  • The problems we’ve been hearing about with policing over the past few years have in part been due to over-aggressive responses to perceived behavior. If an algorithm is telling you that there’s a higher risk of crime in an area, could that exacerbate this problem?
  • Another point that HRDAG  emphasizes in their work is the difference between crime and the reporting of crime. If you put more police in some areas, you’ll see more crime being reported in that area. It doesn’t mean that there’s actually more crime committed in that area.

Friday links dump

What I’ve been reading (or meaning to read) this week:

An FDA for algorithms?

Andrew Tutt put out an interesting position paper where he argues that we need the equivalent of an FDA for algorithms. The paper

explains the diversity of algorithms that already exist and that are soon to come. In the future most algorithms will be “trained,” not “designed.” That means that the operation of many algorithms will be opaque and difficult to predict in border cases, and responsibility for their harms will be diffuse and difficult to assign. Moreover, although “designed” algorithms already play important roles in many life-or-death situations (from emergency landings to automated braking systems), increasingly “trained” algorithms will be deployed in these mission-critical applications.

It’s an interesting argument. Two things that come to mind when I think about this:

  • The FDA ultimately still deals with drugs that operate on the body. I feel that algorithms that apply across multiple domains will require much more varied domain expertise, and it might be hard to do this within a single agency.
  • A regulatory agency is slow. The FDA has been slow to react to the demands of personalized medicine, especially for rare diseases where the normal expectations of drug protocols might not be possible to achieve. How would a regulatory agency be nimble enough to adjust to the even more rapidly changing landscape of algorithm design?

Related links

A dump of what I’ve been reading lately:


Fairness: The view from abroad

Research in algorithmic fairness is inextricably linked to the legal system. Certain approaches that might seem algorithmically sound are illegal, and other approaches rely on specific legal definitions of bias.

This means that it’s hard to research that crosses national boundaries. Our work on disparate impact is limited to the US. In fact, the very idea of disparate impact appears to be US-centric.

Across the ocean, in France, things are different, and more complicated. I was at the Paris ML meetup organized by the indefatigable Igor Carron, and heard a fascinating presentation by Pierre Saurel.

I should say ‘read’ instead of ‘heard’. His slides were in English, but the presentation itself was in French. It was about the ethics of algorithms, as seen by the French judicial system, and was centered around a case where Google was sued for defamation as a result of the autocomplete suggestions generated during a partial search.

Google initially lost the case, but the ruling was eventually overturned by the French Cour de Cassation, the final court of appeals. In its judgement, it made the argument that algorithms are by definition neutral and cannot exhibit any sense of intention, and therefore Google can’t be held responsible for the results of automatic algorithm-driven suggestions.

This is a fine example of defining the problem away: if an algorithm is neutral by definition, then it cannot demonstrate bias. Notice how the idea of disparate impact gets around this by thinking about outcomes rather than intent.

But a consequence of this ruling is that bringing cases of algorithmic bias in French courts will now be much more difficult.

The jury is still out on this issue across the world. In Australia, Google was held liable for search results that pointed to defamatory content: in this case, an algorithm was producing the results, but the company was still viewed as liable.


Should algorithms come under the purview of FOIA ?

Nick Diakopoulos studies computational and data journalism, and has long been concerned about algorithmic transparency to aid journalism. In the link above, he points to a case in Michigan where the city of Warren was being sued to reveal the formula they used to  calculate water and sewer fees.

Thinking about FOIA (Update: the Freedom of Information Act) for algorithms (or software) brings up all kinds of interesting issues, legal and technical:

  • Suppose we do require that the software be released? Can’t it just be obfuscated so that we can’t really tell what it’s doing, except as a black box ?
  • Suppose we instead require that the algorithm be released. What if it’s a learning algorithm that was trained on some data? If we release the final trained model, that might tell us what the algorithm is doing, but not why.
  • Does it even make sense to release the training data (as Sorelle suggests)? What happens if the algorithm is constantly learning (like an online learning algorithm)? Then would we need to timestamp the data so we can roll back to whichever version is under litigation? (This last suggestion was made by Nick in our twitter conversation).
  • But suppose the algorithm instead makes use of reinforcement learning, and adapts in response to its environment. How on earth can we capture the entire environment used to influence the algorithm?

If we replaced ‘algorithm’ by ‘human’, none of this makes sense. If we’re deciding whether a human decision maker erred in some way, we don’t need to know their life story and life experiences. So we shouldn’t need to know this for an algorithm.

But a human can document their decision-making process in a way that’s interpretable by a court. Maybe that’s what we need to require from an algorithmic decision-making process.

NPR: Can Computers be Racist?


As will come as no surprise to readers of this blog, algorithms can make biased decisions.  NPR tackles this question in their latest All Tech Considered (which I was interviewed for!).

They start by talking to Jacky Alcine, the software engineer who discovered that Google Photos had tagged his friend as an animal:

As Jacky points out: “One could say, ‘Oh, it’s a computer,’ I’m like OK … a computer built by whom? A computer designed by whom? A computer trained by whom?” It’s a short segment, but we go on to talk a bit about how that bias could come about.

What I want to emphasize here is that, while hiring more Black software engineers would likely help and make it more likely that these issues would be caught quickly, it is not enough. As Jacky implies, the training data itself is biased. In this case, likely by including more photos of white people and animals than of Black people. In other cases, because the labels have been created by people whose past racist decisions are being purposefully used to guide future decisions.

Consider the automated hiring algorithms now touted by many startups (Jobaline, Hirevue, Gild, …). If an all-white company attempts to use their current employees as training data, i.e., attempts to find future employees who are like their current employees, then they’re likely to continue being an all-white company. That’s because the data about their current employees encodes systemic racial bias such as differences between white and Black SAT test-takers even when controlling for ability. Algorithmic decisions will find and replicate this bias.

We need to be proactive to keep such biases from influencing algorithmic decisions.