Joho the BlogJoho the Blog - Page 2 of 965 - Let's just see what happens

December 5, 2017

[liveblog] Conclusion of Workshop on Trustworthy Algorithmic Decision-Making

I’ve been at a two-day workshop sponsored by the Michigan State Uiversity and the National Science Foundation: “Workshop on Trustworthy Algorithmic Decision-Making.” After multiple rounds of rotating through workgroups iterating on five different questions, each group presented its findings — questions, insights, areas of future research.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Seriously, I cannot capture all of this.

Conduct of Data Science

What are the problems?

  • Who defines and how do we ensure good practice in data science and machine learning?

Why is the topic important? Because algorithms are important. And they have important real-world effects on people’s lives.

Why is the problem difficult?

  • Wrong incentives.

  • It can be difficult to generalize practices.

  • Best practices may be good for one goal but not another, e.g., efficiency but not social good. Also: Lack of shared concepts and vocabulary.

How to mitigate the problems?

  • Change incentives

  • Increase communication via vocabularies, translations

  • Education through MOOCS, meetups, professional organizations

  • Enable and encourage resource sharing: an open source lesson about bias, code sharing, data set sharing

Accountability group

The problem: How to integratively assess the impact of an algorithmic system on the public good? “Integrative” = the impact may be positive and negative and affect systems in complex ways. The impacts may be distributed differently across a population, so you have to think about disparities. These impacts may well change over time

We aim to encourage work that is:

  • Aspirationally casual: measuring outcomes causally but not always through randomized control trials.

  • The goal is not to shut down algorithms to to make positive contributions that generat solutions.

This is a difficult problem because:

  • Lack of variation in accountability, enforcements, and interventions.

  • It’s unclear what outcomes should be measure and how. This is context-dependent

  • It’s unclear which interventions are the highest priority

Why progress is possible: There’s a lot of good activity in this space. And it’s early in the topic so there’s an ability to significantly influence the field.

What are the barriers for success?

  • Incomplete understanding of contexts. So, think it in terms of socio-cultural approaches, and make it interdisciplinary.

  • The topic lies between disciplines. So, develop a common language.

  • High-level triangulation is difficult. Examine the issues at multiple scales, multiple levels of abstraction. Where you assess accountability may vary depending on what level/aspect you’re looking at.

Handling Uncertainty

The problem: How might we holistically treat and attribute uncertainty through data analysis and decisions systems. Uncertainty exists everywhere in these systems, so we need to consider how it moves through a system. This runs from choosing data sources to presenting results to decision-makers and people impacted by these results, and beyond that its incorporation into risk analysis and contingency planning. It’s always good to know where the uncertainty is coming from so you can address it.

Why difficult:

  • Uncertainty arises from many places

  • Recognizing and addressing uncertainties is a cyclical process

  • End users are bad at evaluating uncertain info and incorporating uncertainty in their thinking.

  • Many existing solutions are too computationally expensive to run on large data sets

Progress is possible:

  • We have sampling-based solutions that provide a framework.

  • Some app communities are recognizing that ignoring uncertainty is reducing the quality of their work

How to evaluate and recognize success?

  • A/B testing can show that decision making is better after incorporating uncertainty into analysis

  • Statistical/mathematical analysis

Barriers to success

  • Cognition: Train users.

  • It may be difficult to break this problem into small pieces and solve them individually

  • Gaps in theory: many of the problems cannot currently be solved algorithmically.

The presentation ends with a note: “In some cases, uncertainty is a useful tool.” E.g., it can make the system harder to game.

Adversaries, workarounds, and feedback loops

Adversarial examples: add a perturbation to a sample and it disrupts the classification. An adversary tries to find those perturbations to wreck your model. Sometimes this is used not to hack the system so much as to prevent the system from, for example, recognizing your face during a protest.

Feedback loops: A recidivism prediction system says you’re likely to commit further crimes, which sends you to prison, which increases the likelihood that you’ll commit further crimes.

What is the problem: How should a trustworthy algorithm account for adversaries, workarounds, and feedback loops?

Who are the stakeholders?

System designers, users, non-users, and perhaps adversaries.

Why is this a difficult problem?

  • It’s hard to define the boundaries of the system

  • From whose vantage point do we define adversarial behavior, workarounds, and feedback loops.

Unsolved problems

  • How do we reason about the incentives users and non-users have when interacting with systems in unintended ways.

  • How do we think about oversight and revision in algorithms with respect to feedback mechanisms

  • How do we monitor changes, assess anomalies, and implement safeguards?

  • How do we account for stakeholders while preserving rights?

How to recognize progress?

  • Mathematical model of how people use the system

  • Define goals

  • Find stable metrics and monitor them closely

  • Proximal metrics. Causality?

  • Establish methodologies and see them used

  • See a taxonomy of adversarial behavior used in practice

Likely approaches

  • Security methodology to anticipating and unintended behaviors and adversarial interactions’. Monitor and measure

  • Record and taxonomize adversarial behavior in different domains

  • Test . Try to break things.

Barriers

  • Hard to anticipate unanticipated behavior

  • Hard to define the problem in particular cases.

  • Goodhardt’s Law

  • Systems are born brittle

  • What constitutes adversarial behavior vs. a workaround is subjective.

  • Dynamic problem

Algorithms and trust

How do you define and operationalize trust.

The problem: What are the processes through which different stakeholders come to trust an algorithm?

Multiple processes lead to trust.

  • Procedural vs. substantive trust: are you looking at the weights of the algorithms (e.g.), or what were the steps to get you there?

  • Social vs personal: did you see the algorithm at work, or are you relying on peers?

These pathways are not necessarily predictive of each other.

Stakeholders build truth through multiple lenses and priorities

  • the builders of the algorithms

  • the people who are affected

  • those who oversee the outcomes

Mini case study: a child services agency that does not want to be identified. [All of the following is 100% subject to my injection of errors.]

  • The agency uses a predictive algorithm. The stakeholders range from the children needing a family, to NYers as a whole. The agency knew what into the model. “We didn’t buy our algorithm from a black-box vendor.” They trusted the algorithm because they staffed a technical team who had credentials and had experience with ethics…and who they trusted intuitively as good people. Few of these are the quantitative metrics that devs spend their time on. Note that FAT (fairness, accountability, transparency) metrics were not what led to trust.

Temporality:

  • Processes that build trust happen over time.

  • Trust can change or maybe be repaired over time. “

  • The timescales to build social trust are outside the scope of traditional experiments,” although you can perhaps find natural experiments.

Barriers:

  • Assumption of reducibility or transfer from subcomponents

  • Access to internal stakeholders for interviews and process understanding

  • Some elements are very long term

 


 

What’s next for this workshop

We generated a lot of scribbles, post-it notes, flip charts, Slack conversations, slide decks, etc. They’re going to put together a whitepaper that goes through the major issues, organizing them, and tries to capture the complexity while helping to make sense of it.

There are weak or no incentives to set appropriate levels of trust

Key takeways:

  • Trust is irreducible to FAT metrics alone

  • Trust is built over time and should be defined in terms of the temporal process

  • Isolating the algorithm as an instantiation misses the socio-technical factors in trust.

Be the first to comment »

December 4, 2017

Workshop: Trustworthy Algorithmic Decision-Making

I’m at a two-day inter-disciplinary workshop on “Trustworthy Algorithmic Decision-Making” put on by the National Science Foundation and Michigan State University. The 2-page whitepapers
from the participants are online. (Here’s mine.) I may do some live-blogging of the workshops.

Goals:

– Key problems and critical qustionos?

– What to tell pol;icy-makers and others about the impact of these systems?

– Product approaches?

– What ideas, people, training, infrastructure are needed for these approaches?

Excellent diversity of backgrounds: CS, policy, law, library science, a philosopher, more. Good diversity in gender and race. As the least qualified person here, I’m greatly looking forward to the conversations.

Be the first to comment »

December 2, 2017

[liveblog] Doaa Abu-Elyounes on "Bail or Jail? Judicial vs. Algorithmic decision making"

I’m at a weekly AI talk put on by Harvard’s Berkman Klein Center for Internet & Society and the MIT Media Lab. Doaa Abu-Elyounes is giving a talk called “Bail or Jail? Judicial vs. Algorithmic decision making”.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Doaa tells us that this talk is a work in progress.

We’ve all heard now about AI-based algorithms that are being used to do risk assessments in pretrial bail decisions. She thinks this is a good place to start using algorithms, although it’s not easy.

The pre-trial stage is supposed to be very short. The court has to determine if the defendant, presumed innocent, will be released on bail or jailed. The sole considerations are supposed to be whether the def is likely to harm someone else or flee. Preventive detention has many efffects, mostly negative for the defendant.
(The US is a world leader in pre-trial detainees. Yay?)

Risk assessment tools have been used for more than 50 years. Actuarial tools have shown greater predictive power than clinical judgment, and can eliminate some of the discretionary powers of judges. Use of these tools have long been controversy What type of factors to include in the power? Is the use of demographic factors to make predictions fair to individuals?

Existing tools use regression analysis. Now machine learning can learn from much more data. Mechanical predictions [= machine learning] are more accurate than statistical predictions, but may not be explicable.

We think humans can explain their decisions and we want machines to be able to as well. But look at movie reviews. Humans can tell if a review is positive. We can teach which words are positive or negative, getting 60% accuracy. Or we can have a human label the reviews as positive or negative and let the machine figure out what the factor are — via machine leaning — in which case we get 80% accuracy but may lose explicability.

With pretrial situations, what is the automated task is that the machine should be performing?

There’s a tension between accuracy and fairness. Computer scientists are trying to quantify these questions What does a fair algorithm look like? John Kleinberg and colleagues did a study of this [this one?]. Their algorithms reduced violent crime by 25% with no change in jailing rates, without increasing racial disparities. In short, the algorithm seems to have done a more accurate job with less bias.

Doaa lists four assessment tools she will be looking at: the Pretrial Risk Assessment [this one?], the Public Safety Assessment, the Virginia Pretrial Risk assessment Instrument and the Colorado Pretrial Assessment Tool.

Doaa goes through questions that should be asked of these tools, beginning with: Which factors are considered in each? [She dives into the details for all four tools. I can’t capture it. Sorry.]

What are the sources of data? (3 out of 4 rely on interviews and databases.)

What is the quality of the data? “This is the biggest problem jurisdictions are dealing with when using such a tool.” “Criminal justice data is notoriously poor.” And, of course, if a machine learning system is trained on discriminatory data, its conclusions are likely to reflect those biases.

The tools neeed to be periodically validated using data from its own district’s population. Local data matters.

There should be separate scores for flight risk and public safety All but the PSA provide only a single score. This is important because there are separate remedies for the two concerns. E.g., you might want to lock up someone who is a risk to public safety, but take away the passport of someone who is a flight risk.

Finally, the systems should discriminate among reasons for flight risk. E.g., because the defendant can’t afford the cost of making it to court or because she’s fleeing?

Conclusion: Pretrial is the front door of the criminal justice system and affects what happens thereafter. Risk assessment tools should not replace judges, but they bring benefits. They should be used, and should be made as transparent as possible. There are trade offs. The tool will not eliminate all bias but might help reduce it.

Q&A

Q: Do the algorithms recognize the different situations of different defendants?

A: Systems do recognize this, but not in sophisticated ways. That’s why it’s important to understand why a defendant might be at risk of missing a court date. Maybe we could provide poor defendants with a Metro card.

Q: Could machine learning be used to help us be more specific in the types of harm? What legal theories might we drawn on to help with this?

A: [The discussion got too detailed for me to follow. Sorry.]

Q: There are different definitions of recidivism. What do we do when there’s a mismatch between the machines and the court?

A: Some states give different weights to different factors based on how long ago the prior crimes were committed. I haven’t seen any difference in considering how far ahead the risk of a possible next crime is.

Q: [me] While I’m very sympathetic to allowing machine learning to be used without always requiring that the output be explicable, when it comes to the justice system, do we need explanations so not only is justice done, but we can have trust that it’s being done?

A: If we can say which factors are going into a decision — and it’s not a lot of them — if the accuracy rate is much higher than manual systems, then maybe we can give up on always being able to explain exactly how it came to its decisions. Remember, pre-trial procedures are short and there’s usually not a lot of explaining going on anyway. It’s unlikely that defendants are going to argue over the factors used.

Q: [me] Yes, but what about the defendant who feels that she’s being treated differently than some other person and wants to know why?

A: Judges generally don’t explain how they came to their decisions anyway. The law sets some general rules, and the comparisons between individuals is generally within the framework of those rules. The rules don’t promise to produce perfectly comparable results. In fact, you probably can’t easily find two people with such similar circumstances. There are no identical cases.

Q: Machine learning, multilevel regression level, and human decision making all weigh data and produce an outcome. But ML has little human interaction, statistical analysis has some, and the human decision is all human. Yet all are in fact algorithmic: the judge looks at a bond schedule to set bail. Predictability as fairness is exacerbated by the human decisions since the human cannot explain her model.

Q: Did you find any logic about why jurisdictions picked which tool? Any clear process for this?

A: It’s hard to get that information about the procurement process. Usually they use consultants and experts. There’s no study I know of that looks at this.

Q: In NZ, the main tool used for risk assessment for domestic violence is a Canadian tool called ODARA. Do tools work across jurisdictions? How do you reconcile data sets that might be quite different?

A: I’m not against using the same system across jurisdictions — it’s very expensive to develop one from scratch — but they need to be validated. The federal tool has not been, as far as I know. (It was created in 2009.) Some tools do better at this than others.

Q: What advice would you give to a jurisdiction that might want to procure one? What choices did the tools make in terms of what they’re optimized for? Also: What about COMPAS?

A: (I didn’t talk about COMPAS because it’s notorious and not often used in pre-trial, although it started out as a pre-trial tool.) The trade off seems to be between accuracy and fairness. Policy makers should define more strictly where the line should be drawn.

Q: Who builds these products?

A: Three out of the four were built in house.

Q: PSA was developed by a consultant hired by the Arnold Foundation. (She’s from Luminosity.) She has helped develop a number of the tools.

Q: Why did you decide to research this? What’s next?

A: I started here because pre-trial is the beginning of the process. I’m interested in the fairness question, among other things.

Q: To what extent are the 100+ factors that the Colorado tool considers available publicly? Is their rationale for excluding factors public? Because they’re proxies for race? Because they’re hard to get? Or because back then 100+ seemed like too many? And what’s the overlap in factors between the existing systems and the system Kleinberg used?

A: Interviewing defendants takes time, so 100 factors can be too much. Kleinberg only looked at three factors. Another tool relied on six factors.

Q: Should we require private companies to reveal their algorithms?

A: There are various models. One is to create an FDA for algorithms. I’m not sure I support that model. I think private companies need to expose at least to the govt the factors that they’re including. Others would say I’m too optimistic about the government.

Q: In China we don’t have the pre-trial part, but there’s an article saying that they can make the sentencing more fair by distinguishing among crimes. Also, in China the system is more uniform so the data can be aggregated and the system can be made more accurate.

A: Yes, states are different because they have different laws. Exchanging data between states is not very common and may not even be possible.

Be the first to comment »

November 29, 2017

"The Walking Dead" is Negan

[SPOILERS??] There are no direct spoilers of the “So and So dies” sort in this post, but it assumes you are pretty much up to date on the current season of The Walking Dead.

The Walking Dead has become Negan. I mean the show itself.

Negan brings to the show a principle of chaos: you never know who he’s going to bash to death. This puts all the characters at risk, although perhaps some less so than others based on their fan-base attachments.

That adds some threat and tension of the sort that Game of Thrones used to have. But only if it’s a principle of chaos embedded within a narrative structure and set of characters that we care about. And for the prior season and the current one, there’s almost no narrative structure and, frankly, not that many characters who don’t feel like narrative artifices.

As a result, the main tension in the current season is exactly the same as it was at the beginning of last season when we waited to find out who Negan would choose to bash to death. Negan was so random that “the viewer discussions generally were attempts to anticipate what the writers wanted to do to us”the viewer discussions generally were attempts to anticipate what the writers wanted to do to us. They had to kill someone significant or else the threat level would go down. But they couldn’t kill so-and-so because s/he was too popular, or whatever. There were no intrinsic reasons why Negan would chose one victim over another — Wild Card! — so the reasons had to have to do with audience retention.

This entire season is random in that bad way. The writers are now Negan, choosing randomly among Team Rick’s characters. They’re going to kill off someone for some ratings-based reason, and we’re just waiting for them to make up their mind.

The series didn’t start out this way. It had characters in conflict, and characters in arcs. Rick and The Punisher. Carol and her sister. Daryl and his other brother Daryl. Gingerbeard and The Mullet. Now there’s nothing, maybe because every character’s arc has been the same: S/he becomes an empowered action star.

There are still some things I like about the show. For example, it’s heartening to watch them work on the female empowerment, although it’d be more interesting if they didn’t all become like Rick. And Negan is a pretty good villain. Sure, I could do with fewer predictable charming smiles, but he’s scary.

But I’ll be damned if in the last episode of this series [MADE-UP SPOILERS AHEAD] Team Rick (which will probably be Team Maggie by then) realizes that it has become Negan. I’ll be especially pissed off if the last shot is of the dying Jesus saying, “We are Negan.” Star wipe. Out. Puke.

Be the first to comment »

November 25, 2017

Milkweed’s skeleton

Here are two photos of a milkweed plant taken within seconds of each other, at early dusk, using my Pixel mobile phone. One is with the flash and the other is without.

milk weed without flash

milk weed with flash

 

I think it broke my Pixel’s lighting algorithm.

Be the first to comment »

November 19, 2017

[liveblog][ai] A Harm-reduction framework for algorithmic accountability

I’m at one of the weekly Harvard’s Berkman Klein Center for Internet & Society and MIT Media Lab talks. Alexandra Wood and Micah Altman are talking about “A harm reduction framework for algorithmic accountability over personal information” — a snapshot of their ongoing research at the Privacy Tools Project. The PTP is an interdisciplinary project that investigates tools for sharing info while preserving privacy.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Alexandra says they’ve been developing frameworks for assessing privacy risk when collecting personal data, and have been looking at the controls that can be used to protect individuals. They’ve found that privacy tools address a narrow slice of the problem; other types of misuse of data require other approaches.

She refers to some risk assessment algorithms used by the courts that have turned out to be racially biased, to have unbalanced error rates (falsely flagging black defendants as future criminals at twice the rate as white defendents), and are highly inaccurate. “What’s missing is an analysis of harm”What’s missing is an analysis of harm. “Current approaches to regulating algorithmic classification and decision-making largely elide harm,” she says. “The ethical norms in the law point to the broader responsibilities of the algorithms’ designers.”

Micah says there isn’t a lot of work mapping the loss privacy to other harms. The project is taking an interdisciplinary approach to this. [I don’t trust my blogging here. Don’t get upset with Micah for my ignorance-based reporting!]

Social science treats harm pragmatically. It finds four main dimensions of well-being: wealth, health, life satisfaction and the meaningful choices available to people. Different schools take different approaches to this, some emphasizing physical and psychological health, others life satisfaction, etc.

But to assess harm, you need to look at people’s lives over time. E.g., how does going to prison affect people’s lives? Being sentenced decreases your health, life-satisfaction, choices, and loer income. “The consequences of sentencing are persistent and individually catastrophic.”

He shows a chart from ProPublica based on Broward County data that shows that the risk scores for white defendants skews heavily toward lower scores while the scores for black defendants is more evenly distributed. This by itself doesn’t prove that it’s unfair. You have to look at the causes of the differences in those distributions.

Modern inference theory says something different about harm. A choice is harmful if the causal effect of that outcome is worse, and the causal effect is measured by potential outcomes. “The causal impact of smoking is not simply that you may get cancer, but includes the impact of not smoking”The causal impact of smoking is not simply that you may get cancer, but includes the impact of not smoking, such as possibly gaining weight. You have to look at the counter-factuals

The COMPAS risk assessment tool that has been the subject of much criticism is affected by which training data you use, choice of algorithm, the application of it to the individual, and the use of the score in sentencing. Should you exclude information about race? Or exclude any info that might violate people’s privacy? Or make it open or not? And how to use the outcomes?

Can various protections reduce harm from COMPAS? Racial features were not explicitly included in the COMPAS model. But there are proxies for race. Removing the proxies could lead to less accurate predictions, and make it difficult to study and correct for bias. That is, removing that data (features) doesn’t help that much and might prevent you from applying corrective measures.

Suppose you throw out the risk score. Judges are still biased. “The adverse impact is potentially greater when the decision is not informed by an algorithm’s prediction.” A recent paper by John Kleinberg showed that “algorithms predicting pre-trial assessments were less biased than decisions made by human judges”algorithms predicting pre-trial assessments were less biased than decisions made by human judges. [I hope I got that right. It sounds like a significant finding.]

There’s another step: going from the outcomes to the burdens these outcomes put on people. “An even distribution of outcomes can produce disproportionate burdens.” E.g. juvenile defendants have more to lose — more of their years will be negatively affected by a jail sentence — so having the same false positive and negatives for adults and juveniles would impost a greater burden on the juveniles. When deciding it an algorithmic decision is unjust, you can’t just look at the equality of error rates.

A decision is unjust when it is: 1. Dominated (all groups pay a higher burden for the same social benefit); 2. Unprogressive (higher relative burdens on members of classes who are less well off); 3. Individually catastrophic (wrong decisions are so harmful that it reduces the well being of individuals in members of a known class); 4) Group punishment (an effect on an entire disadvantaged class.)

For every decision, theere are unavoidable constraints: a tradeoff between the individual and the social group; a privacy cost; can’t be equally accurate in all categories; can’t be fair without comparing utility across people; it’s impossible to avoid constraints by adding human judgment because the human is still governed by these constraints.

Micah’s summary for COMPAS: 1. Some protections would be irrelevant (inclusion of sensitive characteristics and and protection of indvidual information). Other protections would be insufficient (no intention to discriminate, open source/open data/FCRA).

Micah ends with a key question about fairness that has been too neglected: “Do black defendants bear a relatively higher cost than whites from bad decisions that prevent the same social harms?”

Be the first to comment »

November 5, 2017

[liveblog] Stefania Druga on how kids can help teach us about AI

Stefania Druga, a graduate student in the Personal Robots research group at the MIT Media Lab, is leading a discussion focusing on how children can help us to better understand and utilize AI. She’s going to talk about some past and future research projects.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

She shows two applications of AI developed for kids The first is Cayla, a robotic doll. “It got hacked three days after it was released in Germany” and was banned there. The second is Aristotle, which was supposed to be an Alexa for kids. A few weeks ago Mattel decided not to release it, after “parents worried about their kids’ privacy signed petitions”parents worried about their kids’ privacy signed petitions.

Stefania got interested in what research was being done in this field. She found a couple of papers. One (Lovato & Piper 2015
) showed that children mirrored how they interact with Siri, e.g., how angry or assertive. Antother (McReynolds et al., 2017 [pdf]) found that how children and parents interact with smart toys revealed how little parents and children know about how much info is being collected by these toys, e.g. Hello Barbie’s privacy concerns. It also looked at how parents and children were being incentivized to share info on social media.

Stefania’s group did a pilot study, having parents and 27 kids interact with various intelligent agents, including Alexa, Julie Chatbot, Tina the T.Rex, and Google Home. Four or five chidlren would interact with the agent at a time, with an adult moderator. Their parents were in the room.

Stefania shows a video about this project. After the kids interacted with the agent, they asked if it was smarter than the child, if it’s a friend, if it has feelings. Children anthropomorphize AIs in playful ways. Most of the older children thought the agents were more intelligent than they were, while the younger children weren’t sure. Two conclusions: Makers of these devices should pay more attention to how children interact with them, and we need more research.

What did the children think? They thought the agents were friendly and truthful. “They thought two Alexa devices were separate individuals.”They thought two Alexa devices were separate individuals. The older children thought about these agents differently than the younger ones do. This latter may be because of how children start thinking about smartness as they progress through school. A question: do they think about artificial intelligence as being the same as human intelligence?

After playing with the agents, they would probe the nature of the device. “They are trying to place the ontology of the device.”

Also, they treated the devices as gender ambiguous.

The media glommed onto this pilot study. E.g., MIT Technology Review: “Growing Up with Alexa.” Or NYTimes: “Co-Parenting with Alexa.” Wired: Understanding Generation Alpha. From these articles, it seems that people are really polarized about the wisdom of introducing children to these devices.

Is this good for kids? “It’s complicated,” Stefania says. The real question is: How can children and parents leverage intelligent agents for learning, or for other good ends?

Her group did another study, this summer, that had 30 pairs of children and parents navigate a robot to solve a maze. They’d see the maze from the perspective of the robot. They also saw a video of a real mouse navigating a maze, and of another robot solving the maze by itself. “Does changing the agent (themselves, mouse, robot) change their idea of intelligence?”Does changing the agent (themselves, mouse, robot) change their idea of intelligence? Kids and parents both did the study. Most of the kids mirrored their parents’ choices. They even mirrored the words the parents used…and the value placed on those words.

What next? Her group wants to know how to use these devices for learning. They build extensions using Scratch, including for an open source project called Poppy. (She shows a very cool video of the robot playing, collaborating, painting, etc.) Kids can program it easily. Ultimately, she hopes that this might help kids see that they have agency, and that while the robot is smart at some things, people are smart at other things.

Q&A

Q: You said you also worked with the elderly. What are the chief differences?

A: Seniors and kids have a lot in common. They were especially interested in the fact that these agents can call their families. (We did this on tablets, and some of the elderly can’t use them because their skin is too dry.)

Q: Did learning that they can program the robots change their perspective on how smart the robots are?

A: The kids who got the bot through the maze did not show a change in their perspective. When they become fluent in customizing it and understanding how it computes, it might. It matters a lot to have the parents involved in flipping that paradigm.

Q: How were the parents involved in your pilot study?

A: It varied widely by parent. It was especially important to have the parents there for the younger kids because the device sometimes wouldn’t understand the question, or what sorts of things the child could ask it about.

Q: Did you look at how the participants reacted to robots that have strong or weak characteristics of humans or animals.

A: We’ve looked at whether it’s an embodied intelligent agent or not, but not at that yet. One of our colleagues is looking at questions of empathy.

Q: [me] Do the adults ask their children to thank Siri or other such agents?

A: No.

Q: [me] That suggests they’re tacitly shaping them to think that these devices are outside of our social norms?

Q: In my household, the “thank you” extinguishes itself: you do it a couple of times, and then you give it up.

A: This indicates that these systems right now are designed in a very transactional way. You have to say the wake up call every single phrase. But these devices will advance rapidly. Right now it’s unnatural conversation. But wth chatbots kids have a more natural conversation, and will say thank you. And kids want to teach it things, e.g, their names or favorite color. When Alexa doesn’t know what the answer is, the natural thing is to tell it, but that doesn’t work.

Q: Do the kids think these are friends?

A: There’s a real question around animism. Is it ok for a device to be designed to create a relationship with, say, a senior person and to convince them to take their pills? My answer is that people tend to anthropomorphize everything. Over time, kids will figure out the limitations of these tools.

Q: Kids don’t have genders for the devices? The speaking ones all have female voices. The doll is clearly a female.

A: Kids were interchanging genders because the devices are in a fluid space in the spectrum of genders. “They’re open to the fact that it’s an entirely new entity.”

Q: When you were talking about kids wanting to teach the devices things, I was thinking maybe that’s because they want the robot to know them. My question: Can you say more about what you observed with kids who had intelligent agents at home as opposed to those who do not?

A: Half already had a device at home. I’m running a workshop in Saudi Arabia with kids there. I’m very curious to see the differences. Also in Europe. We did one in Colombia among kids who had never seen an Alexa before and who wondered where the woman was. They thought there must be a phone inside. They all said good bye at the end.

Q: If the wifi goes down, does the device’s sudden stupidness concern the children? Do they think it died?

A: I haven’t tried that.

[me] Sounds like that would need to go through an IRB.

Q: I think your work is really foundational for people who want to design for kids.

Comments Off on [liveblog] Stefania Druga on how kids can help teach us about AI

October 29, 2017

Restoring photos’ dates from Google Photos download

Google Photos lets you download your photos, which is good since they’re you’re own damn photos. But when you do, every photo’s file will be stamped as having been created on the day you downloaded it. This is pretty much a disaster, especially since the photos have names like “IMG_20170619_153745.jpg.”

Ok, so maybe you noticed that the file name Google Photos supplies contains the date the photo was taken. So maybe you want to just munge the file name to make it more readable, as in “2017-06-19.” If you do it that way, you’ll be able to sort chronologically just by sorting alphabetically. But the files are still all going to be dated with the day you did the download, and that’s going to mean they won’t sort chronologically with any photos that don’t follow that exact naming convention.

So, you should adjust the file dates to reflect the day the photos were taken.

It turns out to be easy. JPG’s come with a header of info (called EXIF) that you can’t see but your computer can. There’s lots of metadata about your photo in that header, including the date it was taken. So, all you need to do is extract that date and re-set your file’s date to match it.

Fortunately, the good folks on the Net have done the heavy lifting for us.

Go to http://www.sentex.net/~mwandel/jhead/ and download the right version of jhead for your computer. Put it wherever you keep utilities. On my Mac I put it in /Applications/Utilities/, but it really doesn’t matter.

Open up a terminal. Log in as a superuser:

sudo -i

Enter the password you use to log into your computer and press Enter.

Change to the directory the contains the photos you want to update. You do this with the “cd” command, as in:

cd /Applications/Users/david/Downloads/GooglePhotos/

That’s a Mac-ish path. I’m going to assume you know enough about paths to figure out your own, how to handle spaces in directory names, etc. If not, my dear friend Google can probably help you.

You can confirm that you’ve successfully changed to the right directory by typing this into your terminal:

pwd

That will show you your current directory. Fix it if it’s wrong because the next command will change the file dates of jpgs in whatever directory you’re currently in.

Now for the brutal finishing move:

/Applications/Utilities/jpg-batch-file-jhead/jhead -ft *.jpg

Everything before the final forward slash is the path to wherever you put the jhead file. After that final slash the command is telling the terminal to run the jhead program, with a particular set of options (-ft) and to apply it to all the files in that directory that end with the extension “.jpg.”

That’s it.

If you want to run the program not just on the directory that you’re in but in all of its subdirectories, this post at StackExchange tells you how: https://photo.stackexchange.com/questions/27245/is-there-a-free-program-to-batch-change-photo-files-date-to-match-exif

Many thanks to Matthias Wandel for jhead and his other contributions to making life with bits better for us all.

Comments Off on Restoring photos’ dates from Google Photos download

October 28, 2017

Making medical devices interoperable

The screen next to a patient’s hospital bed that displays the heart rate, oxygen level, and other moving charts is the definition of a dumb display. How dumb is it, you ask? If the clip on a patient’s finger falls off, the display thinks the patient is no longer breathing and will sound an alarm…even though it’s displaying outputs from other sensors that show that, no, the patient isn’t about to die.

The problem, as explained by David Arney at an open house for MD PnP, is that medical devices do not share their data in open ways. That is, they don’t interoperate. MD PnP wants to fix that.

The small group was founded in 2004 as part of MIT’s CIMIT (Consortia for Improving Medicine with Innovation and Technology). Funded by grants, including from the NIH and CRICO Insurance, it currently has 6-8 people working on ways to improve health care by getting machines talking with one another.

The one aspect of hospital devices that manufacturers have generally agreed on is that they connect via serial ports. The FDA encourages this, at least in part because serial ports are electrically safe. So, David pointed to a small connector box with serial ports in and out and a small computer in between. The computer converts the incoming information into an open industry standard (ISO 11073). And now the devices can play together. (The “PnP” in the group’s name stands for “plug ‘n’ play,” as we used to say in the personal computing world.)

David then demonstrated what can be done once the data from multiple devices interoperate.

  • You can put some logic behind the multiple signals so that a patient’s actual condition can be assessed far more accurately: no more sirens when an oxygen sensor falls off a finger.

  • You can create displays that are more informative and easier to read — and easier to spot anomalies on — than the standard bedside monitor.

  • You can transform data into other standards, such as the HL7 format for entry into electronic medical records.

  • If there is more than one sensor monitoring a factor, you can do automatic validation of signals.

  • You can record and perhaps share alarm histories.

  • You can create what is functionally an API for the data your medical center is generating: a database that makes the information available to programs that need it via publish and subscribe.

  • You can aggregate tons of data (while following privacy protocols, of course) and use machine learning to look for unexpected correlations.

MD PnP makes its stuff available under an open BSD license and publishes its projects on GitHub. This means, for example, that while PnP has created interfaces for 20-25 protocols and data standards used by device makers, you could program its connector to support another device if you need to.

Presumably not all the device manufacturers are thrilled about this. The big ones like to sell entire suites of devices to hospitals on the grounds that all those devices interoperate amongst themselves — what I like to call intraoperating. But beyond corporate greed, it’s hard to find a down side to enabling more market choice and more data integration.

Comments Off on Making medical devices interoperable

October 27, 2017

[liveblog] Nathan Matias on The Social impact of real-time algorithm decisions

J. Nathan Matias is giving a talk at the weekly AI session held by MIT Media Lab and Harvard’s Berkman Klein Center for Internet & Society. The title is: Testing the social impact of real-time algorithm decisions. (SPOILER: Nate is awesome.) Nathan will be introducing CivilServant.io to us, a service for researching the effects of tech and how it can be better directed to toward the social outcomes we (the civil society “we”) desire. (That’s my paraphrase.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

In 2008, the French government approved a law against Web sites that encourage anorexia and bulimia. In 2012, Instagram responded to pressure to limit hashtags that “actively promote self-harm.” Instagram had 40M users, almost as many as France’s 55M active Net users. Researchers at Georgia Tech several years later found that some self-harm sites on Instagram had higher engagement after Instagram’s actions. “ If your algorithm reliably detects people who are at risk of committing suicide, what next? ” If your algorithm reliably detects people who are at risk of committing suicide, what next? If the intervention isn helpful, your algorithm is doing harm.

Nathan shows a two-axis grid for evaluating algorithms: fair-unfair and benefits-harms. Accuracy should be considered to be on the same axis as fairness because it can be measured mathematically. But you can’t test the social impact without putting it into the field. “I’m trying to draw attention to the vertical axis [harm-benefit].”

We often have in mind a particular pipeline: training > model > prediction > people . Sometimes there are rapid feedback loops where the decisions made by people feed back into the model. A judicial system’s prediction risk scores may have no such loop. But the AI that manages a news feed is probably getting the readers’ response as data that tunes the model.

We have organizations that check the quality of items we deal with: UL for electrical products, etc. But we don’t have that sort of consumer protection for social tech. The results are moral panics, bad policies, etc. This is the gap Nate is trying to fill with CivilServant.io, a project supported by the Media Lab and GlobalVoices.

Here’s an example of one of CivilServant’s projects:

Managing fake news is essential for democracy. The social sciences have been dealing with this for quite a while by doing research on individual perception and beliefs, on how social context and culture influence beliefs … and now on algorithms that make autonomous decisions that affect us as citizens e.g., newsfeeds. Newsfeeds work this way: someone posts a link. People react to it, e.g. upvote, discuss, etc. The feed service watches that behavior and uses it to promote or demote the item. And then it feeds back in.

We’ve seen lots of examples of pernicious outcomes of this. E.g., at Reddit an early upvote can have dramatic impact on its ratings over time.

What can we do to govern online misinfo? We could surveill and censor. We could encourage counter-speech. We can imagine some type of algorithmic governance. We can use behavioral nudges, e.g. Facebook tagging articles as “disputed.” But all of these assume that these interventions change behaviors and beliefs. Those assumptions are not always tested.

Nate was approached by /r/worldnews at Reddit, a subreddit with14M subscribers and 70 moderators. At Reddit, moderating can be a very time consuming effort. (Nate spoke to a Reddit mod who had stopped volunteering at a children’s hospital in order to be a mod because she thought she could do more good that way.) This subreddit’s mods wanted to know if they could question the legitimacy of an item without causing it to surge on the platform. Fact-checking a post could nudge Reddit’s AI to boost its presence because of the increased activity.

So, they did an experiment asking people to fact check an article, or fact check and downvote if you can’t verify it. They monitored the ranking of the articles by Reddit for 3 months. [Nate now gives some math. Sorry I can’t capture (or understand) it.] The result: to his surprise, “encouraging fact checking reduced the average rank position of an article”encouraging fact checking reduced the average rank position of an article. Encouraging fact checking and down-voting reduced the spread of inaccurate news by Reddit’s algorithms. [I’m not confident I’m getting that right

Why did encouraging fact checking reduce rankings, but fact checking and voting did not? The mods think this might be because it gave users a constructive way to handle articles from reviled sources, reducing the number of negative comments about them. [I hope I’m getting this right.] Also, “reactance” may have nudged people to upvote just to spite the instructions. Also, users may have mobilized friends to vote on the artciles. Also, encouraging two tasks (fact check and then vote) rather than one may have influenced he timing of the algorithm, making the down-votes less impactful.

This is what Nate calls an “AI-Nudge”: a “second-order effect of influencing human behavior on the behavior of an algorithmic system.” It means you have to think about how humans interact with AI.

Often when people are working on AI, they’re starting from computer science and math. The question is: how can we use social science methods to research the effect of AI? Paluck and Cialdini see a cycle of Pilot/Lab experiments > qualitative methods > field experiences > theory / policy / design. In the Reddit example, Nathan spent considerable time with the community to understand their issues and how they interact with the AI.

Another example of a study: identifying and reducing side-effects of automated copyright law enforcement on Twitter. When people post something to Twitter, bots monitor it to see if violates copyright, resulting in a DMCA takedown notice being issued. Twitter then takes it down. The Lumen Project from BKC archives these notices. The CivilService project observes those notices in real time to study the effects. E.g., “a user’s tweets per day tends to drop after they receive a takedown notice … for a 42-day period”a user’s tweets per day tends to drop after they receive a takedown notice, and then continues dropping throughout the 42-day period they researched. Why this long-term decrease in posting? Maybe fear and risk. Maybe awareness of surveillance.

So, how can these chilling effects be reduced? The CivilService project automatically sends users info about their rights and about surveillance. The results of this intervention are not in yet. The project hopes to find ways to lessen the public’s needless withdrawal from social media. The research can feed empirical legal studies. Policymakers might find it useful. Civil rights orgs as well. And the platforms themselves.

In the course of the Q&As, Nathan mentions that he’s working on ways to explain social science research that non-experts can understand. CivilService’s work is with user communities and it’s developed a set of ways for communicating openly with the users.

Q: You’re trying to make AI more fair…

A: I’m doing consumer protection, so as experts like you work on making AI more fair, we can see the social effects of interventions. But there are feedback loops among them.

Q: What would you do with a community that doesn’t want to change?

A: We work with communities that want our help. In the 1970s, Campbell wrote an essay: “The Experimenting Society.” He asked if by doing behavioral research we’re becoming an authoritarian society because we’re putting power in the hands of the people who can afford to do the research. He proposed enabling communities to do their own studies and research. He proposed putting data scientists into towns across the US, pool their research, and challenge their findings. But this was before the PC. Now it’s far more feasible.

Q: What sort of pushback have you gotten from communities?

A: Some decide not to work with us. In others, there’s contention about the shape of the project. Platforms have changed how they view this work. Three years ago, the platforms felt under siege and wounded. That’s why I decided to create an independent organization. The platforms have a strong incentive to protect their reputations.

Comments Off on [liveblog] Nathan Matias on The Social impact of real-time algorithm decisions

« Previous Page | Next Page »