Joho the Blogfairness Archives - Joho the Blog

April 5, 2018

[liveblog] Neil Gaikwad Human-AI Collaboration for Sustainable Market Design

I’m at a ThursdAI talk (Harvard’s Berkman Klein Center for Internet & Society and MIT Media Lab) being given by Neil Gaikwad (Twitter: @neilthemathguy, a Ph.D. at the MediaLab, in the Space Enabled Group.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Markets and institutions are parts of complex ecosystem, Neil says. His research looks at data from satellites that show how the Earth is changing: crops, water, etc. Once you’ve gathered the data, you can use machine learning to visualize the changes. There are ecosystems, including of human behavior, that are affected by this. It affects markets and institutions. E.g., a drought may require an institutional response, and affect markets.

Traditional markets, financial markets, and gig economies all share characteristics. Farmers markets are complex ecosystems of people with differing information and different amounts of it, i.e. asymmetric info. Same for financial markets. Same for gig economies.

Indian markets have been failing; there have been 300,000 suicides in the last 30 years. Stock markets have crashed suddenly due to blackbox marketing; in some cases we still don’t know why. And London has banned Uber. So, it doesn’t matter which markets or institutions we look at, they’re losing our trust.

An article in New Scientist asked what we can do to regain this trust. For black box AI, there are questions of fairness and equity. But what would human-machine collaboration be like? Are there design principles for markets.?

Neil stops for us to discuss.

Q: How do you define the justice?

A: Good question. Fairness? Freedom? The designer has a choice about how to define it.

Q: A UN project created an IT platform that put together farmers and direct consumers. The pricing seemed fairer to both parties. So, maybe avoid intermediaries, as a design principle?

Neil continues. So, what is the concept of justice here?

1. Rawls and Kant: Transcendental institutionalism. It’s deontological: follow a principle for perfect justice. Use those principles to define a perfect institution. The properties are defined by a social contract. But it doesn’t work, as in the examples we just saw. What is missing. People and society. [I.e., you run the institution according to principles, but that doesn’t guarantee that the outcome will be fair and just. My example: Early Web enthusiasts like me thought the Web was an institution built on openness, equality, creative anarchy, etc., yet that obviously doesn’t ensure that the outcome will share those properties.]

2. Realized-focused institutionalism (Sen
2009): How to reverse this trend. It is consequentialist: what will be the consequences of the design of an institution. It’s a comparative assessment of different forms of institutions. Instead of asking for the perfectly justice society, Sen asks how justice can be advanced. The most critical tool for evaluating any institution is to look at how it actually realizes how people’s lives change.

Sen argues that principles are important. They can be expressed by “niti,” Sanskrit for rules and institutions. But you also need nyaya: a form of social arrangement that makes sure that those rules are obeyed. These rules come from social choice, not social contract.

Example: Gig economies. The data comes from mechanical turk, upwork, crowdflower, etc. This creates employment for many people, but it’s tough. E.g., identifying images. Use supervised learning for this. The Turkers, etc., do the labelling to train the image recognition system. The Turkers make almost no money at this. This is the wicked problem of market design: The worker can have identifications rejected, sometimes with demeaning comments.

The Market for Lemons” (Akerlog, et al., 1970): all the cars started to look alike and now all gig-workers look alike to those who hire them: there’s no value given to bringing one’s value to the labor.

So, who owns the data? Who has a stake in the models? In the intellectual property?

If you’re a gig worker, you’re working with strangers. You don’t know the reputation of the person giving me data. Or renting me the Airbnb apartment. So, let’s put a rule: reputation is the backbone. In sharing economies, most of the ratings are the highest. Reputation inflation. So, can we trust reputation? This happens because people have no incentive to rate. There’s social pressure to give a positive rating.

So, thinking about Sen, can we think about an incentive for honest reputation? Neil’s group has been thinking about a system [I thought he said Boomerang, but I can’t find that]. It looks at the workers’ incentives. It looks at the workers’ ratings of each other. If you’re a requester, you’ll see the workers you like first.

Does this help AI design?

MoralMachine has had 1.3M voters and 18M pairwise comparisons (i.e., people deciding to go straight or right). Can this be used as a voting based system for ethical decision making (AAAI 2018)? You collect the pairwise preferences, learn the model of preference, come to a collective preference, and have voting rules for collective decision.

Q: Aren’t you collect preferences, not normative judgments? The data says people would rather kill fat people than skinny ones.

A: You need the social behavior but also rules. For this you have to bring people into the loop.

Q: How do we differentiate between what we say we want and what we really want?

A: There are techniques, such as “Bayesian Truth Serum”nomics.mit.edu/files/1966”>Bayesian Truth Serum.

Conclusion: The success of markets, institutions or algorithms, is highly dependent on how this actually affects people’s lives. This thinking should be central to the design and engineering of socio-technical systems.

3 Comments »

December 2, 2017

[liveblog] Doaa Abu-Elyounes on "Bail or Jail? Judicial vs. Algorithmic decision making"

I’m at a weekly AI talk put on by Harvard’s Berkman Klein Center for Internet & Society and the MIT Media Lab. Doaa Abu-Elyounes is giving a talk called “Bail or Jail? Judicial vs. Algorithmic decision making”.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Doaa tells us that this talk is a work in progress.

We’ve all heard now about AI-based algorithms that are being used to do risk assessments in pretrial bail decisions. She thinks this is a good place to start using algorithms, although it’s not easy.

The pre-trial stage is supposed to be very short. The court has to determine if the defendant, presumed innocent, will be released on bail or jailed. The sole considerations are supposed to be whether the def is likely to harm someone else or flee. Preventive detention has many efffects, mostly negative for the defendant.
(The US is a world leader in pre-trial detainees. Yay?)

Risk assessment tools have been used for more than 50 years. Actuarial tools have shown greater predictive power than clinical judgment, and can eliminate some of the discretionary powers of judges. Use of these tools have long been controversy What type of factors to include in the power? Is the use of demographic factors to make predictions fair to individuals?

Existing tools use regression analysis. Now machine learning can learn from much more data. Mechanical predictions [= machine learning] are more accurate than statistical predictions, but may not be explicable.

We think humans can explain their decisions and we want machines to be able to as well. But look at movie reviews. Humans can tell if a review is positive. We can teach which words are positive or negative, getting 60% accuracy. Or we can have a human label the reviews as positive or negative and let the machine figure out what the factor are — via machine leaning — in which case we get 80% accuracy but may lose explicability.

With pretrial situations, what is the automated task is that the machine should be performing?

There’s a tension between accuracy and fairness. Computer scientists are trying to quantify these questions What does a fair algorithm look like? John Kleinberg and colleagues did a study of this [this one?]. Their algorithms reduced violent crime by 25% with no change in jailing rates, without increasing racial disparities. In short, the algorithm seems to have done a more accurate job with less bias.

Doaa lists four assessment tools she will be looking at: the Pretrial Risk Assessment [this one?], the Public Safety Assessment, the Virginia Pretrial Risk assessment Instrument and the Colorado Pretrial Assessment Tool.

Doaa goes through questions that should be asked of these tools, beginning with: Which factors are considered in each? [She dives into the details for all four tools. I can’t capture it. Sorry.]

What are the sources of data? (3 out of 4 rely on interviews and databases.)

What is the quality of the data? “This is the biggest problem jurisdictions are dealing with when using such a tool.” “Criminal justice data is notoriously poor.” And, of course, if a machine learning system is trained on discriminatory data, its conclusions are likely to reflect those biases.

The tools neeed to be periodically validated using data from its own district’s population. Local data matters.

There should be separate scores for flight risk and public safety All but the PSA provide only a single score. This is important because there are separate remedies for the two concerns. E.g., you might want to lock up someone who is a risk to public safety, but take away the passport of someone who is a flight risk.

Finally, the systems should discriminate among reasons for flight risk. E.g., because the defendant can’t afford the cost of making it to court or because she’s fleeing?

Conclusion: Pretrial is the front door of the criminal justice system and affects what happens thereafter. Risk assessment tools should not replace judges, but they bring benefits. They should be used, and should be made as transparent as possible. There are trade offs. The tool will not eliminate all bias but might help reduce it.

Q&A

Q: Do the algorithms recognize the different situations of different defendants?

A: Systems do recognize this, but not in sophisticated ways. That’s why it’s important to understand why a defendant might be at risk of missing a court date. Maybe we could provide poor defendants with a Metro card.

Q: Could machine learning be used to help us be more specific in the types of harm? What legal theories might we drawn on to help with this?

A: [The discussion got too detailed for me to follow. Sorry.]

Q: There are different definitions of recidivism. What do we do when there’s a mismatch between the machines and the court?

A: Some states give different weights to different factors based on how long ago the prior crimes were committed. I haven’t seen any difference in considering how far ahead the risk of a possible next crime is.

Q: [me] While I’m very sympathetic to allowing machine learning to be used without always requiring that the output be explicable, when it comes to the justice system, do we need explanations so not only is justice done, but we can have trust that it’s being done?

A: If we can say which factors are going into a decision — and it’s not a lot of them — if the accuracy rate is much higher than manual systems, then maybe we can give up on always being able to explain exactly how it came to its decisions. Remember, pre-trial procedures are short and there’s usually not a lot of explaining going on anyway. It’s unlikely that defendants are going to argue over the factors used.

Q: [me] Yes, but what about the defendant who feels that she’s being treated differently than some other person and wants to know why?

A: Judges generally don’t explain how they came to their decisions anyway. The law sets some general rules, and the comparisons between individuals is generally within the framework of those rules. The rules don’t promise to produce perfectly comparable results. In fact, you probably can’t easily find two people with such similar circumstances. There are no identical cases.

Q: Machine learning, multilevel regression level, and human decision making all weigh data and produce an outcome. But ML has little human interaction, statistical analysis has some, and the human decision is all human. Yet all are in fact algorithmic: the judge looks at a bond schedule to set bail. Predictability as fairness is exacerbated by the human decisions since the human cannot explain her model.

Q: Did you find any logic about why jurisdictions picked which tool? Any clear process for this?

A: It’s hard to get that information about the procurement process. Usually they use consultants and experts. There’s no study I know of that looks at this.

Q: In NZ, the main tool used for risk assessment for domestic violence is a Canadian tool called ODARA. Do tools work across jurisdictions? How do you reconcile data sets that might be quite different?

A: I’m not against using the same system across jurisdictions — it’s very expensive to develop one from scratch — but they need to be validated. The federal tool has not been, as far as I know. (It was created in 2009.) Some tools do better at this than others.

Q: What advice would you give to a jurisdiction that might want to procure one? What choices did the tools make in terms of what they’re optimized for? Also: What about COMPAS?

A: (I didn’t talk about COMPAS because it’s notorious and not often used in pre-trial, although it started out as a pre-trial tool.) The trade off seems to be between accuracy and fairness. Policy makers should define more strictly where the line should be drawn.

Q: Who builds these products?

A: Three out of the four were built in house.

Q: PSA was developed by a consultant hired by the Arnold Foundation. (She’s from Luminosity.) She has helped develop a number of the tools.

Q: Why did you decide to research this? What’s next?

A: I started here because pre-trial is the beginning of the process. I’m interested in the fairness question, among other things.

Q: To what extent are the 100+ factors that the Colorado tool considers available publicly? Is their rationale for excluding factors public? Because they’re proxies for race? Because they’re hard to get? Or because back then 100+ seemed like too many? And what’s the overlap in factors between the existing systems and the system Kleinberg used?

A: Interviewing defendants takes time, so 100 factors can be too much. Kleinberg only looked at three factors. Another tool relied on six factors.

Q: Should we require private companies to reveal their algorithms?

A: There are various models. One is to create an FDA for algorithms. I’m not sure I support that model. I think private companies need to expose at least to the govt the factors that they’re including. Others would say I’m too optimistic about the government.

Q: In China we don’t have the pre-trial part, but there’s an article saying that they can make the sentencing more fair by distinguishing among crimes. Also, in China the system is more uniform so the data can be aggregated and the system can be made more accurate.

A: Yes, states are different because they have different laws. Exchanging data between states is not very common and may not even be possible.

Comments Off on [liveblog] Doaa Abu-Elyounes on "Bail or Jail? Judicial vs. Algorithmic decision making"

November 19, 2017

[liveblog][ai] A Harm-reduction framework for algorithmic accountability

I’m at one of the weekly Harvard’s Berkman Klein Center for Internet & Society and MIT Media Lab talks. Alexandra Wood and Micah Altman are talking about “A harm reduction framework for algorithmic accountability over personal information” — a snapshot of their ongoing research at the Privacy Tools Project. The PTP is an interdisciplinary project that investigates tools for sharing info while preserving privacy.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Alexandra says they’ve been developing frameworks for assessing privacy risk when collecting personal data, and have been looking at the controls that can be used to protect individuals. They’ve found that privacy tools address a narrow slice of the problem; other types of misuse of data require other approaches.

She refers to some risk assessment algorithms used by the courts that have turned out to be racially biased, to have unbalanced error rates (falsely flagging black defendants as future criminals at twice the rate as white defendents), and are highly inaccurate. “What’s missing is an analysis of harm”What’s missing is an analysis of harm. “Current approaches to regulating algorithmic classification and decision-making largely elide harm,” she says. “The ethical norms in the law point to the broader responsibilities of the algorithms’ designers.”

Micah says there isn’t a lot of work mapping the loss privacy to other harms. The project is taking an interdisciplinary approach to this. [I don’t trust my blogging here. Don’t get upset with Micah for my ignorance-based reporting!]

Social science treats harm pragmatically. It finds four main dimensions of well-being: wealth, health, life satisfaction and the meaningful choices available to people. Different schools take different approaches to this, some emphasizing physical and psychological health, others life satisfaction, etc.

But to assess harm, you need to look at people’s lives over time. E.g., how does going to prison affect people’s lives? Being sentenced decreases your health, life-satisfaction, choices, and loer income. “The consequences of sentencing are persistent and individually catastrophic.”

He shows a chart from ProPublica based on Broward County data that shows that the risk scores for white defendants skews heavily toward lower scores while the scores for black defendants is more evenly distributed. This by itself doesn’t prove that it’s unfair. You have to look at the causes of the differences in those distributions.

Modern inference theory says something different about harm. A choice is harmful if the causal effect of that outcome is worse, and the causal effect is measured by potential outcomes. “The causal impact of smoking is not simply that you may get cancer, but includes the impact of not smoking”The causal impact of smoking is not simply that you may get cancer, but includes the impact of not smoking, such as possibly gaining weight. You have to look at the counter-factuals

The COMPAS risk assessment tool that has been the subject of much criticism is affected by which training data you use, choice of algorithm, the application of it to the individual, and the use of the score in sentencing. Should you exclude information about race? Or exclude any info that might violate people’s privacy? Or make it open or not? And how to use the outcomes?

Can various protections reduce harm from COMPAS? Racial features were not explicitly included in the COMPAS model. But there are proxies for race. Removing the proxies could lead to less accurate predictions, and make it difficult to study and correct for bias. That is, removing that data (features) doesn’t help that much and might prevent you from applying corrective measures.

Suppose you throw out the risk score. Judges are still biased. “The adverse impact is potentially greater when the decision is not informed by an algorithm’s prediction.” A recent paper by John Kleinberg showed that “algorithms predicting pre-trial assessments were less biased than decisions made by human judges”algorithms predicting pre-trial assessments were less biased than decisions made by human judges. [I hope I got that right. It sounds like a significant finding.]

There’s another step: going from the outcomes to the burdens these outcomes put on people. “An even distribution of outcomes can produce disproportionate burdens.” E.g. juvenile defendants have more to lose — more of their years will be negatively affected by a jail sentence — so having the same false positive and negatives for adults and juveniles would impost a greater burden on the juveniles. When deciding it an algorithmic decision is unjust, you can’t just look at the equality of error rates.

A decision is unjust when it is: 1. Dominated (all groups pay a higher burden for the same social benefit); 2. Unprogressive (higher relative burdens on members of classes who are less well off); 3. Individually catastrophic (wrong decisions are so harmful that it reduces the well being of individuals in members of a known class); 4) Group punishment (an effect on an entire disadvantaged class.)

For every decision, theere are unavoidable constraints: a tradeoff between the individual and the social group; a privacy cost; can’t be equally accurate in all categories; can’t be fair without comparing utility across people; it’s impossible to avoid constraints by adding human judgment because the human is still governed by these constraints.

Micah’s summary for COMPAS: 1. Some protections would be irrelevant (inclusion of sensitive characteristics and and protection of indvidual information). Other protections would be insufficient (no intention to discriminate, open source/open data/FCRA).

Micah ends with a key question about fairness that has been too neglected: “Do black defendants bear a relatively higher cost than whites from bad decisions that prevent the same social harms?”

Comments Off on [liveblog][ai] A Harm-reduction framework for algorithmic accountability

April 6, 2015

Culture is unfair

At Jonathan Zittrain‘s awesome lecture upon the occasion of his ascending to the Bemis Chair at Harvard Law (although shouldn’t you really descend into a chair?), he made the point that through devices like Microsoft Kinect, our TVs are on the verge of knowing how many people are in the room watching. After all, your camera (= phone) already can identify the faces in a photo.

This will inevitably lead to the claim that if five people are watching a for-pay movie on a TV, we ought to be paying 5x what a single person does. After all, it’s delivering five times the value. What are you, a bunch of pirates?

There is some fairness to that claim. We’d pay for five tickets if we saw it in a theater.

But it also feels wrong. Very wrong. And not just because it costs us more.

For example, I’m told that if you buy a subscription to the NY Times it comes with one license for online access. So, if you’re having the old roll o’ stories thrown onto your porch every morning, your spouse is free to read it too, but you’re going to have to buy a separate online subscription if s/he wants to read it online. That doesn’t feel right.

The pay-per-use argument may be fair but it flies in the face of how we all know culture works. Culture only exists if we share what matters to us. There is no culture without this. That’s why it’s so important I can share a physical book with you, or can send you a copy of a magazine article that I think you’ll like. Culture is the sharing of creative works and the conversations we have about them.

That’s why the creators of the US Constitution put a time limit on copyright. Yes, it feels unfair if after fourteen years (the original length of copyright protection) someone publishes my book without my permission and doesn’t give me any of the profits. Sure. But fairness is not the only criterion.

Culture cannot flourish or perhaps even exist when everything has a fair price.

1 Comment »