Ever since we started thinking about algorithmic fairness and the general issue of data-driven decision-making, there’s always been this nagging issue of “well what if there are cues in data that seem racist/sexist/(–)-ist and yet provide a good signal for a decision?”
There’s no shortage of people willing to point this out: see for example my post on the standard tropes that appear whenever someone discovers bias in some algorithmic process. Most of the responses betray a unexamined belief in the truth of what algorithms discover in data, and that is not satisfying either.
So the problem we’ve faced is this. If you examine closely the computer science literature on fairness and bias, it becomes clear that people are talking at cross-purposes: essentially arguing about why your orange is not more like my apple. And it has become clear that this is because of different assumptions about the world (how biased it is, how unbiased certain features are, and so on).
Here’s the pitch:
Can we separate out assumptions and beliefs about fairness from mechanisms that we deploy to ensure it? And in doing so, can we provide a useful vocabulary for talking about these issues within a common framework?
Here’s the result of our two-year long quest:
On the (im)possibility of fairness
What does it mean for an algorithm to be fair? Different papers use different notions of algorithmic fairness, and although these appear internally consistent, they also seem mutually incompatible. We present a mathematical setting in which the distinctions in previous papers can be made formal. In addition to characterizing the spaces of inputs (the “observed” space) and outputs (the “decision” space), we introduce the notion of a construct space: a space that captures unobservable, but meaningful variables for the prediction.
We show that in order to prove desirable properties of the entire decision-making process, different mechanisms for fairness require different assumptions about the nature of the mapping from construct space to decision space. The results in this paper imply that future treatments of algorithmic fairness should more explicitly state assumptions about the relationship between constructs and observations.
This paper has been a struggle to write. It’s a strange paper in that the main technical contribution is mainly conceptual: establishing what we think are the right basic primitives that can be used to express (mathematically) concepts like fairness, nondiscrimination, and structural bias.
We owe a great debt to our many friends in the social sciences community, as well as the decades of research on this topic in the social sciences. Much of the conceptual development we outline has been laid out in prose form by the many theories of social justice starting with Rawls, but particularly by Roemer. Our main goal has been to mathematize some of these ideas so that we can apply them to algorithms.
There’s a great deal of trepidation with which we release this: it’s in many ways a preliminary work that raises more questions than it answers. But we’ve benefited from lots of feedback within CS and without, and hope that this might clarify some of the discussions swirling around algorithmic fairness.