I’ve made it to Session 2 of my series of posts on the FAT* conference.
If you build it they will come.
How should we build systems that incorporate all that we’ve learnt about fairness, accountability and transparency. How do we go from saying “this is a problem” to saying “Here’s a solution”?
Three of the four papers in this session seek (in different ways) to address this question, focusing on both the data and the algorithms that make up an ML model.
The paper by Meg Young and friends from UW makes a strong argument for the idea of a data trust. Recognizing that we need good data to drive good policy and to evaluate technology and also recognizing that there are numerous challenges — privacy, fairness, and accountabilty — around providing such data, not to mention issues with private vs public ownership, they present a case study of a data trust built with academics as the liaison between private and public entities that might want to share data and (other) entities that might want to make use of it.
They bring up a number of interesting technology ideas from the world of privacy and fairness: differential privacy to control data releases, causal modeling to generate real-ish data for use in analysis and so on. They argue that this is probably the best way to reconcile legal and commercial hurdles over data sharing and is in fact the responsible way to go.
Takeaway: I think this is an interesting proposal. To some extent the devil is in the details and the paper left me wanting more, but they have a great proof of concept to check out. While I might be quite happy with academics being the “trusted escrow agent”, I wonder if that’s always the best option?
Of course it’s not just data you need governance for. What about models?
This is one in a series of papers coming up right now that tackle the problem of model portability. How do I know if my model is being applied out of context with unpredictable results?
The solution from Margaret Mitchell et al (9 authors and counting…!) is to attach a model “spec sheet” to a trained model. The spec sheet would give you important documentation about the model — how it was trained, with what training regime, what data, what error rates and so on — in the hope that when the model is applied elsewhere, the spec sheet will prevent you from taking it out of context.
Takeaway: This is again a creative use of the idea of ‘user scripts‘ as a way to carry context. I wondered when I first read it whether it makes sense to talk about a model spec in the abstract without looking at a specific domain like some papers have done. I think the jury is still out (i.e “more research needed”) to see if model spec sheets can be useful in full generality or if we need the right “level of abstraction” to make usable, but this is an interesting direction to explore.
But instead of building data trusts or model specs, how about instrumenting your code itself with certificates of fairness that can be monitored while the code runs?
This paper by Albarghouthi and Vinitsky takes a programming-language perspective on building fair classifiers. Suppose we could annotate our code with specifications describing what properties a classifier must satisfy and then have tools that ensure that these specifications are satisfied while the code runs. That would be pretty neat!
That’s basically what they do in this paper, by taking advantage of Python decorations to encode desired fairness specs. They show how to capture notions like disparate impact and demographic parity and even weak forms of individual fairness. One thought I did have though: they are essentially trying to encode the ability to verify probabilistic statements, and I wonder if it might be easier to do this in one of the new and shiny probabilistic programming languages out there? Granted, Python is a more mainstream language (uh-oh, the PL police will be after me now).
I know who you are, but what am I?
It’s great to build systems that draw on the tech we’ve developed over the last many years. But there’s still more to learn about the ongoing shenanigans happening on the internet.
You don’t want to mess with Christo Wilson. He can sue you — literally. He’s part of an ACLU lawsuit against the DoJ regarding the CFAA and its potential misuse to harass researchers trying to audit web systems. In this paper Shan Jiang, John Martin and Christo turn their audit gaze to online A/B testing.
If you’ve ever compulsively reloaded the New York Times, you’ll notice that the headlines of articles will change from time to time. Or you’ll go to a website and not see the same layout as someone else. This is because (as the paper illustrates) major websites are running experiments… on you.. They are doing A/B testing of various kinds potentially to experiment with different layouts, or potentially even to show you different kinds of content depending on who you are.
The paper describes an ingenious mechanism to reveal when a website is using A/B testing and determine what factors appear to be going into the decision to show particular content. The experimental methodology is a lot of fun to read.
While the authors are very careful to point out repeatedly that they find no evidence of sinister motives behind the use of A/B testing in the wild, the fact remains that we are being experimented on constantly without any kind of IRB protection (*cough* facebook emotional contagion *cough*). It’s not too far a leap to realize that the quest for “personalization” might mean that we eventually have no shared experience of the internet and that’s truly frightening.
And that’s it for now. Stay tuned for more…