The FAT* Conference is almost upon us, and I thought that instead of live-blogging from the conference (which is always exhausting) I’d do a preview of the papers. Thankfully we aren’t (yet) at 1000 papers in the proceedings, and I can hope to read and say something not entirely stupid (ha!) about each one.
I spent a lot of time pondering how to organize my posts, and then realized the PC chairs had already done the work for me, by grouping papers into sessions. So my plan is to do a brief (BRIEF!) review of each session, hoping to draw some general themes. (ed: paper links will yield downloadable PDFs starting Tuesday Jan 29)
And with that, let’s start with Session 1: Framing and abstraction.
Those who do not learn from history are doomed to repeat it. — George Santayana
Those who learn from history are doomed to repeat it. — twitter user, about machine learning.
Ben Hutchinson and Margaret Mitchell have a fascinating paper on the history of discourse on (un)fairness. It turns out that dating back to the 60s, and in the context of standardized testing, researchers have been worried about the same issues of bias in evaluation that we talk about today.
I’m actually understating the parallels. It’s uncanny how the discussion on fairness and unfairness evolved precisely in the way it’s evolving right now. Their paper has a beautiful table that compares measures of fairness then and now, and the paper is littered with quotes from early work in the area that mirror our current discussions about different notions of fairness, the subtleties in error rate management across racial categories, the concerns over construct validity, and so on. It was a humbling experience for me to read this paper and realize that indeed, everything old is new again.
Takeaways: Among the important takeaways from this paper is:
- The subtle switch from measuring UNfairness to measuring fairness caused the field to eventually wither away. How should we return to studying UNfairness?
- If we can’t get notions of fairness to align with public perceptions, it’s unlikely that we will be able to get public policy to align with our technical definitions.
- It’s going to be important to encode values explicitly in our systems.
which is a useful segue to the next paper:
Assume a spherical cow, …. or a rational man.
Samir Passi and Solon Barocas present the results of an ethnographic study into the (attempted) deployment of a predictive tool to decide which potential car buyers (leads) to send to dealers. The company building the tool sells these leads to dealers, and so wants to make sure that the quality of the leads is high.
This is a gripping read, almost like a thriller. They carefully outline the choices the team (comprising business analysts, data scientists and managers) makes at each stage, and how they go from the high level goal “send high quality leads to dealers” to a goal that is almost ML-ready: “find leads that are likely to have credit scores above 500”. As one likes to say, the journey is more important than the destination, and indeed the way in which the goals get concretized and narrowed based on technical, business and feasibility constraints is both riveting and familiar to anyone working in (or with) corporate data science.
Takeaways: The authors point out that no actual problem in automated decision-making is a pure classification or regression problem. Rather, people (yes, PEOPLE) make a series of choices that narrow the problem space down to something that is computationally tractable. And it’s a dynamic process where data constraints as well as logistical challenges constrain the modeling. At no point in time do ethical or normative concerns surface, but the sequence of choices made clearly has an overall effect that could lead to disparate impact of some kind. They argue correctly, that we spend far too little time paying attention to these choices and the larger role of the pipeline around the (tiny) ML piece.
which is an even nice segue to the last paper:
Context matters, duh!
This is one of my papers, together with Andrew Selbst, danah boyd, Sorelle Friedler and Janet Vertesi. And our goal was to understand the failure modes of Fair-ML systems when deployed in a pipeline. The key thesis of our paper is: Abstraction is a core principle in a computer system, but it’s also the key point of failure when dealing with a socio-technical system.
We outline a series of traps that fair-ML papers fall into even while trying to design what look like enlightened decision systems. You’ll have to read the paper for all the details, but one key trap that I personally struggle with is the formalization trap: the desire to nail down a formal specification that can be then optimized. This is a trap because the nature of the formal specification can be contested and evolve from context to context (even within a single organization, pace the paper above) and a too-hasty formalization can freeze the goals in a way that might not be appropriate for the problem. In other words, don’t fix a fairness definition in stone (this is important: I’m constantly asked by fellow theoryCS people what the one true definition of fairness is — so that they can go away and optimize the heck out of it).
When I read these three papers in sequence, I feel a black box exploding open revealing its messy and dirty inner workings. Ever since Frank Pasquale’s The Black Box Society came out, I’ve felt a constant sentiment from non-technical people (mostly lawyers/policy people) that the goal should be to route policy around the black box “AI” or “ML”. My contention has always been that we can’t do that: that understanding the inner workings of the black box is crucial to understanding both what works and what fails in automated decision systems. Conversely, technical people have been loathe to engage with the world OUTSIDE the black box, preferring to optimize our functions and throw them over the fence, Anathem-style.
I don’t think either approach is viable. Technical designers need to understand (as AOC clearly does!) that design choices that seem innocuous can have major downstream impact and that ML systems are not plug-and-play. But conversely those that wish to regulate and manage such systems need to be willing to go into the nitty gritty of how they are built and think about regulating those processes as well.
Happy families are all alike; every unhappy family is unhappy in its own way. — Tolstoy
There is one way to be fair, but many different ways of being unfair.
Every person with good credit looks the same: but people have bad credit for very different reasons.
There might be more to this idea of “looking at unfairness” vs “looking at fairness. As I had remarked to Ben and Margaret a while ago, it has the feel of an NP vs co-NP question 🙂 – and we know that we don’t know if they’re same.