Black Box Algorithms

The drumroll of warnings about black box algorithms continues. The latest entry comes from Frank Pasquale, with a new book titled The Black Box Society.

The premise is one we’re now sadly familiar with: black box algorithms that have material impact on our lives are already hurting us. And without transparency and a way to make these algorithms accountable, we’re in for a rough ride.

On Privacy And Databuse

Privacy is dead, but not because Scott McNealy said so.

The idea of privacy has driven much of our concerns over data for the last few years, and has been the driver of extremely successful research efforts, most notably in differential privacy.

What we’re seeing now though is a pivot away from the undifferentiated and problematic notion of privacy in data towards a more subtle and nuanced notion of how data is actually used, and how it should be used.

Retired NYPD officers suggest data manipulation

The NYPD’s Biggest Problem Might Actually Be an Overreliance on Numbers

A survey of retired NYPD officers suggests that crime data manipulation has increased as the reliance on data by the NYPD has increased. Manipulation included changing the type of crime listed or discouraging reporting of some crimes. The article suggests that this means there is an overreliance on data. Instead, it sounds to me like there needs to be more oversight to ensure accountability, fairness, and transparency.

23andMe Sells Genetic Data

Genetic Data for Sale

23andMe, a company that sells individuals information about their own genetic data, has just sold access to some of this data to Genentech. Genentech is using the data to do research about Parkinson’s. But it seems an easy jump to imagine that it’s being used by an insurance company.

Genentech is asking for individual data, so 23andMe will need to have consent forms signed before the data is released. But the article also contains this gem: “Its privacy policy notes that it will share aggregated data to third parties (read: sell to pharma and biotech companies) for scientific research if customers sign a consent document. Wojcicki told the San Jose Mercury News that 85 to 90 percent of 23andMe’s customers do.”

And so I want to know: How was it aggregated? Is it anonymized in some way? How? Is it possible to determine someone’s race or gender from the given information (directly or indirectly)? What other information about their health is included (23andMe has customers fill out extensive surveys)?

If it’s possible to use genetic data to make decisions about people (e.g., insurance), then it’s possible to discriminate against people based on their immutable genetic information.