Joho the Blogdata science Archives - Joho the Blog

November 1, 2016

[liveblog][bkc] Paola Villarreal on Public Interest in Data Science

I’m at a Berkman Klein Center lunch time talk by Paola Villarreal [twitter: paw], a BKC fellow, on “Public Interest in Data Science.” (Paola points to a github page for her project info.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.


Public interest, she says, is the effecting of changes in social policies in the interest of the public, especially for the underdog. Data science extracts knowledge and insight from data in various forms, using math, statistics, research, info science and computer science. “What happens if you put data and tech in the hands of civil liberties orgs, human rights activists, media outlets”What happens if you put data and tech in the hands of civil liberties orgs, human rights activists, media outlets, and governments? How might this effect liberty, justice, equality, and transparency and accountability?


She is going to talk about the Data for Justice project, which is supported by the Ford Foundation, the ACLU, and the Mozilla Foundation. The aim is to empower lawyers and advocates to make data-supported cases for improving justice in their communities.


The process: get the data, normalize it, process it, analyze it, visualize it … and then socialize it, inform change, and make it last! She cautions that it is crucial to make sure that you’ve identified the affected communities and that they’re involved in generating a solution. All the stakeholders should be involved in co-designing the solution.


Paola talks about the Annie Dookhan case. Dookhan was a chemist at a Massachusetts crime lab, who falsified evidence, possibly affecting 24,000 cases. Paola shows a table of data: the percentage of adults and juveniles convicted in drug cases and those whose evidence went through Dookhan. It’s a very high number: in some counties, over 25% of the drug convictions used possibly falsified data from Dookhan.


She shows a map of Boston that shows that marijuana-related police interactions occur mainly where people of color live. She plays a clip from marijuana,justiceos.org.


She lists her toolkit, which includes R, Stata, PostGIS, Ant (Augmented Narrative Toolkit),
and Tableau


But what counts is having an impact, she says. That means reaching out to journalists, community organizers, authorities, and lawmakers.


She concludes that data and tech do not do anything by themselves, and data scientists are only one part of a team with a common goal. The intersection of law and data is important. She concludes: Data and tech in the hands of people working with and for the public interest can have an impact on people’s lives.


Q&A

Q: Why are communities not more often involved?


A: It’s hard. It’s expensive. And data scientists are often pretty far removed from community organizing.


Q: Much of the data you’re referring to are private. How do you manage privacy when sharing the data?


A: In the Dookhan case, the data was impounded, and I used security measures. The Boston maps showing where incidents occurred smudged the info across a grid of about half a mile.


A: Kate Crawford talks about how important Paola’s research was in the Dookhan case. “It’s really valuable for the ACLU to have a scientist working on data like this.”


Q: What happened to the people who were tried with Dookhan evidence?


A: [ann] Special magistrates and special hearings were set up…


Q: [charlie nesson] A MOOC is considering Yes on 4 (marijuana legalization ballot question) and someone asked if there is a relationship between cannabis reform and Black Lives Matter. And you’ve answered that question. It’s remarkable that BLM hasn’t cottoned on to cannabis reform as a sister issue.


Q: I’ve been reading Cathy O’Neil‘s Weapons of Math Destruction [me too!] and I’m wondering if you could talk about your passion for social justice as a data scientist.


A: I’m Mexican. I learned to code when I was 12 because I had access to the Internet. I started working as a web developer at 15, and a few years later I was director of IT for the president’s office. I reflected on how I got that opportunity, and the answer was that it was thanks to open source. That inspired me.


Q: We are not looking at what happens to black women. They get criminalized even more often than black men. Also, has anyone looked at questions of environmental justice?


Q: How can we tell if a visualization is valid or is propaganda? Are there organizations doing this?


A: Great question, and I don’t know how to answer it. We publish the code, but of course not everyone can understand it. I’m not using AI or Deep Learning; I’m keeping it simple.


Q: What’s the next big data set you’re going to work on?


A: (She shows a visualization tool she developed that explores police budgets.)


Q: How do you work with journalists? Do you bring them in early?


A: We haven’t had that much interaction with them yet.

Comments Off on [liveblog][bkc] Paola Villarreal on Public Interest in Data Science