NPR: Can Computers be Racist?

Yes.

As will come as no surprise to readers of this blog, algorithms can make biased decisions.  NPR tackles this question in their latest All Tech Considered (which I was interviewed for!).

They start by talking to Jacky Alcine, the software engineer who discovered that Google Photos had tagged his friend as an animal:

As Jacky points out: “One could say, ‘Oh, it’s a computer,’ I’m like OK … a computer built by whom? A computer designed by whom? A computer trained by whom?” It’s a short segment, but we go on to talk a bit about how that bias could come about.

What I want to emphasize here is that, while hiring more Black software engineers would likely help and make it more likely that these issues would be caught quickly, it is not enough. As Jacky implies, the training data itself is biased. In this case, likely by including more photos of white people and animals than of Black people. In other cases, because the labels have been created by people whose past racist decisions are being purposefully used to guide future decisions.

Consider the automated hiring algorithms now touted by many startups (Jobaline, Hirevue, Gild, …). If an all-white company attempts to use their current employees as training data, i.e., attempts to find future employees who are like their current employees, then they’re likely to continue being an all-white company. That’s because the data about their current employees encodes systemic racial bias such as differences between white and Black SAT test-takers even when controlling for ability. Algorithmic decisions will find and replicate this bias.

We need to be proactive to keep such biases from influencing algorithmic decisions.

Advertisements

One thought on “NPR: Can Computers be Racist?

  1. It’s an extreme oversimplification to suggest that an all-white company using current employees as training data will continue being all-white. If black people are just like white people in the relevant criteria, then training data from white people will generalize just fine.

    I.e., if job performance = 3*SAT + 5*work sample test, and this relationship holds for both white people and black people, then you can generalize from white people to black people with no problem.

    Problems will only occur if race is directly predictive – i.e., job performance = 3*SAT + 5*work sample test + 2*isBlack.

    This is of course easily fixable via algorithms – include isBlack as one of the variables in your regression. This has been studied in education quite a bit:

    http://ftp.iza.org/dp8733.pdf
    http://www.mindingthecampus.org/2010/09/the_underperformance_problem/
    https://randomcriticalanalysis.wordpress.com/2015/05/16/on-concentrated-poverty-and-its-effects-on-academic-outcomes/ (this post is awesome, links directly to data)

    However, the sign is actually negative – reality is more like academic performance = 3*SAT + 5*GPA – 2*isBlack (actual numbers are in the first paper). So if we wanted to correct for this bias, we’d be directly penalizing blacks, and by a large amount. In education at least, the bias you identify will make colleges *less white*!

    Remember, bias can have either sign. It’s the height of anthropomorphic reasoning to assume that algorithmic bias will have the same sign that humans do.

    Like

Thoughts?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s