Joho the Blog2b2k Archives - Joho the Blog

July 31, 2018

[2b2k] Errata: Wrong about Wycliffe

I received this about Too Big to Know from Isaiah Hoogendyk, Biblical Data Engineer at Faithlife Corporation:

In chapter 9, “Building the New Infrastructure of Knowledge,” (sorry, don’t have a page number: read this in the Kindle app) you state:

“There was a time when we thought we were doing the common folk a favor by keeping the important knowledge out of their reach. That’s why the Pope called John Wycliffe a heretic in the fourteenth century for creating the first English-language translation of the Christian Bible.”

This is quite false, actually. There was in fact nothing heretical about translating the Scriptures into the vernacular; instead, Wycliffe was condemned for a multitude of heresies regarding rejection of Catholic belief on the Sacraments and the priesthood, among other things. Some of these beliefs were interpolated into the translation of the Scriptures attributed to him (which weren’t even entirely translated by him), but it was mostly his other writings that were censured by the Pope. You can read more about that here: https://plato.stanford.edu/archives/win2011/entries/wyclif/.

Thanks, Isaiah.

Be the first to comment »

May 16, 2018

[liveblog] Aubrey de Grey

I’m at the CUBE Tech conference in Berlin. (I’m going to give a first keynote on the book I’m finishing.) Aubrey de Grey begins his keynote begins by changing the question from “Who wants to get old?” to “Who wants Alzheimers?” because we’ve been brainwashed into thinking that aging is somehow good for us: we get wiser, get to retire, etc. Now we are developing treatments for aging. Ambiguity about aging is now “hugely damaging” because it hinders the support of research. E.g., his SENS Research Foundation is going too slowly because of funding restraints.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

“The defeat of aging via medicine is foreseseeable now.” He says he has to be credible because people have been saying this forever and have been wrong.

“Why is aging still a problem?” One hundred years ago, a third of babies would die before they were one year old. We fixed this in the industrialized world through simple advances, e.g., hygiene, mosquito, antibiotics. So why are diseases of old age so much harder to control? People think it’s because so many things go wrong with us late in life, interacting with one another and creating incredible complexity. But that’s not the main answer.

“Aging is easy to define: it is a side effect of being alive.” “It’s a fact of the operation of the human body generates damage.” It accumulates. The body tolerates a certain amount. When you pass that amount, you get pathologies of old age. Our approach has been to develop geriatric medicine to counteract those pathologies. That’s where most of the research goes.

aubrey de gray metabolism diagram

“Metabolism: The ultimate undocumented spaghetti code”

But that won’t work because the damage continues. Geriatric medicine bangs away at the pathologies, but will necessarily become less effective over time. “We make this mistake because of a misclassification we make.”

If you ask people to make categories of disease, they’ll come up with communicable, congenital, and chronic. Then most people add a fourth way of being sick: aging itself. It includes fraility, sarcopenia (loss of muscle), immunosenesence (aging of the immune system)…But that’s silly. Aging in a living organism is the same as aging in a machine. “Aging is the accumulation of damage that occurs as a side-effect of the body’s normal operation.”It is the accumulation of damage to the body that occurs as an intrinsic side-effect of the body’s normal operation. That means the categories are right, except aging covers column 3 and 4. Column 3 — specific diseases such as alzheimer’s and cancer — is also part aging. This means that aging isn’t a blessing in surprise, and that we can’t say that column 3 are high-priorities of medicine but those in 4 are not.

A hundred years ago a few people started to think about this and realized that if we tried to interfere with the process of aging earlier one, we’d do better. This became the field of gerontology. Some species age much more slowly than others. Maybe we can figure out the basis for that variation. But the metabolism is really really complicated. “This is the ultimate nightmare of uncommented spaghetti code.” We know so little about how the body works.

“There is another approach. And it’s completely bleeding obvious”: Periodically repair the damage. We don’t need to slow down the rate at which metabolism causes damage. We need to engineer a system we don’t understand. But “we don’t need to understand how metabolism causes damag”we don’t need to understand how metabolism causes damage. Nor do we need to know what to do when the damage is too great, because we’re not going to let it get to that state. We do this with, say, antique cars. Preventitive maintenance works. “The only question is, can we do it for a much more complicated machine like the human body?

“We’re sidestepping our ignorance of metabolism and pathology. But we have to cope with the fact that damage is complicated” All of the types of damage, from cell loss toe extracellular matrix stiffening — there are 7 categories — can be repaired through a single approach: genetic repair. E.g., loss of cells can be repaired by replacing them using stem cells. Unfortunately, most of the funding is going only to this first category. SENS was created to enable research on the other seven. Aubrey talks about SENS’ work on protecting cells from the bad effects of cholesterol.

He points to another group (unnamed) that has reinvented this approach and is getting a lot of notice.

He says longevity is not what people think it is. These therapies will let people stay alive longer, but they will also stay youthful longer. “”Longevity is a side effect of health.” ”“Longevity is a side effect of health.”

Will this be only for the rich? Overpopulation? Boredom? Pensions collapse? We’re taking care of overpopulation by cleaning up its effects, he says. He says there are solutions to these problems. But there are choices we have to make. No one wants to get Alzheimers. We can’t have it both ways. Either we want to keep people healthy or not.

He says SENS has been successful enough that they’ve been able to spin out some of the research into commercial operations. But we need to cary on in the non-profit research world as well. Project 21 aims at human rejuvenation clinical trials.

4 Comments »

February 11, 2018

The story of lead and crime, told in tweets

Patrick Sharkey [twitter: patrick_sharkey] uses a Twitter thread to evaluate the evidence about a possible relationship between exposure to lead and crime. The thread is a bit hard to get unspooled correctly, but it’s worth it as an example of:

1. Thinking carefully about complex evidence and data.

2. How Twitter affects the reasoning and its expression.

3. The complexity of data, which will only get worse (= better) as machine learning can scale up their size and complexity.

Note: I lack the skills and knowledge to evaluate Patrick’s reasoning. And, hat tip to David Lazer for the retweet of the thread.

Comments Off on The story of lead and crime, told in tweets

The brain is not a computer and the world is not information

Robert Epstein argues in Aeon against the dominant assumption that the brain is a computer, that it processes information, stores and retrieves memories, etc. That we assume so comes from what I think of as the informationalizing of everything.

The strongest part of his argument is that computers operate on symbolic information, but brains do not. There is no evidence (that I know of, but I’m no expert. On anything) that the brain decomposes visual images into pixels and those pixels into on-offs in a code that represents colors.

In the second half, Epstein tries to prove that the brain isn’t a computer through some simple experiments, such as drawing a dollar bill from memory and while looking at it. Someone committed to the idea that the brain is a computer would probably just conclude that the brain just isn’t a very good computer. But judge for yourself. There’s more to it than I’m presenting here.

Back to Epstein’s first point…

It is of the essence of information that it is independent of its medium: you can encode it into voltage levels of transistors, magnetized dust on tape, or holes in punch cards, and it’s the same information. Therefore, a representation of a brain’s states in another medium should also be conscious. Epstein doesn’t make the following argument, but I will (and I believe I am cribbing it from someone else but I don’t remember who).

Because information is independent of its medium, we could encode it in dust particles swirling clockwise or counter-clockwise; clockwise is an on, and counter is an off. In fact, imagine there’s a dust cloud somewhere in the universe that has 86 billion motes, the number of neurons in the human brain. Imagine the direction of those motes exactly matches the on-offs of your neurons when you first spied the love of your life across the room. Imagine those spins shift but happen to match how your neural states shifted over the next ten seconds of your life. That dust cloud is thus perfectly representing the informational state of your brain as you fell in love. It is therefore experiencing your feelings and thinking your thoughts.

That by itself is absurd. But perhaps you say it is just hard to imagine. Ok, then let’s change it. Same dust cloud. Same spins. But this time we say that clockwise is an off, and the other is an on. Now that dust cloud no longer represents your brain states. It therefore is both experiencing your thoughts and feeling and is not experiencing them at the same time. Aristotle would tell us that that is logically impossible: a thing cannot simultaneously be something and its opposite.

Anyway…

Toward the end of the article, Epstein gets to a crucial point that I was very glad to see him bring up: Thinking is not a brain activity, but the activity of a body engaged in the world. (He cites Anthony Chemero’s Radical Embodied Cognitive Science (2009) which I have not read. I’d trace it back further to Andy Clark, David Chalmers, Eleanor Rosch, Heidegger…). Reducing it to a brain function, and further stripping the brain of its materiality to focus on its “processing” of “information” is reductive without being clarifying.

I came into this debate many years ago already made skeptical of the most recent claims about the causes of consciousness by having some awareness of the series of failed metaphors we have used over the past couple of thousands of years. Epstein puts this well, citing another book I have not read (and another book I’ve consequently just ordered):

In his book In Our Own Image (2015), the artificial intelligence expert George Zarkadakis describes six different metaphors people have employed over the past 2,000 years to try to explain human intelligence.

In the earliest one, eventually preserved in the Bible, humans were formed from clay or dirt, which an intelligent god then infused with its spirit. That spirit ‘explained’ our intelligence – grammatically, at least.

The invention of hydraulic engineering in the 3rd century BCE led to the popularity of a hydraulic model of human intelligence, the idea that the flow of different fluids in the body – the ‘humours’ – accounted for both our physical and mental functioning. The hydraulic metaphor persisted for more than 1,600 years, handicapping medical practice all the while.

By the 1500s, automata powered by springs and gears had been devised, eventually inspiring leading thinkers such as René Descartes to assert that humans are complex machines. In the 1600s, the British philosopher Thomas Hobbes suggested that thinking arose from small mechanical motions in the brain. By the 1700s, discoveries about electricity and chemistry led to new theories of human intelligence – again, largely metaphorical in nature. In the mid-1800s, inspired by recent advances in communications, the German physicist Hermann von Helmholtz compared the brain to a telegraph.

Maybe this time our tech-based metaphor has happened to get it right. But history says we should assume not. We should be very alert to the disanologies, which Epstein helps us with.

Getting this right, or at least not getting it wrong, matters. The most pressing problem with the informationalizing of thought is not that it applies a metaphor, or even that the metaphor is inapt. Rather it’s that this metaphor leads us to a seriously diminished understanding of what it means to be a living, caring creature.

I think.

 

Hat tip to @JenniferSertl for pointing out the Aeon article.

Comments Off on The brain is not a computer and the world is not information

February 1, 2018

Can AI predict the odds on you leaving the hospital vertically?

A new research paper, published Jan. 24 with 34 co-authors and not peer-reviewed, claims better accuracy than existing software at predicting outcomes like whether a patient will die in the hospital, be discharged and readmitted, and their final diagnosis. To conduct the study, Google obtained de-identified data of 216,221 adults, with more than 46 billion data points between them. The data span 11 combined years at two hospitals,

That’s from an article in Quartz by Dave Gershgorn (Jan. 27, 2018), based on the original article by Google researchers posted at Arxiv.org.

…Google claims vast improvements over traditional models used today for predicting medical outcomes. Its biggest claim is the ability to predict patient deaths 24-48 hours before current methods, which could allow time for doctors to administer life-saving procedures.

Dave points to one of the biggest obstacles to this sort of computing: the data are in such different formats, from hand-written notes to the various form-based data that’s collected. It’s all about the magic of interoperability … and the frustration when data (and services and ideas and language) can’t easily work together. Then there’s what Paul Edwards, in his great book A Vast Machine calls “data friction”: “…the costs in time, energy, and attention required simply to collect, check, store, move, receive, and access data.” (p. 84)

On the other hand, machine learning can sometimes get past the incompatible expression of data in a way that’s so brutal that it’s elegant. One of the earlier breakthroughs in machine learning came in the 1990s when IBM analyzed the English and French versions of Hansard, the bi-lingual transcripts of the Canadian Parliament. Without the machines knowing the first thing about either language, the system produced more accurate results than software that was fed rules of grammar, bilingual dictionaries, etc.

Indeed, the abstract of the Google paper says “Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation of patients’ entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. ” It continues: “We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization.”

The paper also says that their approach affords clinicians “some transparency into the predictions.” Some transparency is definitely better than none. But, as I’ve argued elsewhere, in many instances there may be tools other than transparency that can give us some assurance that AI’s outcomes accord with our aims and our principles of fairness.

 


 

I found this article by clicking on Dave Gershgon’s byline on a brief article about the Wired version of the paper of mine I referenced in the previous paragraph. He does a great job explaining it. And, believe me, it’s hard to get a writer — well, me, anyway — to acknowledge that without having to insert even one caveat. Thanks, Dave!

Comments Off on Can AI predict the odds on you leaving the hospital vertically?

December 16, 2017

[liveblog] Harri Ketamo on micro-learning

I’m at the STEAM ed Finland conference in Jyväskylä. Harri Ketamo is giving a talk on “micro-learning.” He recently won a prestigious prize for the best new ideas in Finland. He is interested in the use of AI for learning.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

We don’t have enough good teachers globally, so we have to think about ed in new ways, Harri says. Can we use AI to bring good ed to everyone without hiring 200M new teachers globally? If we paid teachers equivalent to doctors and lawyers, we could hire those 200M. But we apparently not willing to do that.


One challenge: Career coaching. What do you want to study? Why? What are the skills you need? What do you need to know?


His company does natural language analysis — not word matches, but meaning. As an example he shows a shareholder agreement. Such agreements always have the same elements. After being trained on law, his company’s AI can create a map of the topic and analyze a block of text to see if it covers the legal requirements…the sort of work that a legal assistant does. For some standard agreements, we may soon not need lawyers, he predicts.


The system’s language model is a mess of words and relations. But if you zoom out from the map, the AI has clustered the concepts. At the Slush Sanghai conference, his AI could develop a list of the companies a customer might want to meet based on a text analysis of the companies’ web sites, etc. Likewise if your business is looking for help with a project.


Finland has a lot of public data about skills and openings. Universities’ curricula are publicly available.[Yay!] Unlike LinkedIn, all this data is public. Harri shows a map that displays the skills and competencies Finnish businesses want and the matching training offered by Finnish universities. The system can explore public information about a user and map that to available jobs and the training that is required and available for it. The available jobs are listed with relevancy expressed as a percentage. It can also look internationally to find matches.


The AI can also put together a course for a topic that a user needs. It can tell what the core concepts are by mining publications, courses, news, etc. The result is an interaction with a bot that talks with you in a Whatsapp like way. (See his paper “Agents and Analytics: A framework for educational data mining with games based learning”). It generates tests that show what a student needs to study if she gets a question wrong.


His newest project, in process: Libraries are the biggest collections of creative, educational material, so the AI ought to point people there. His software can find the common sources among courses and areas of study. It can discover the skills and competencies that materials can teach. This lets it cluster materials around degree programs. It can also generate micro-educational programs, curating a collection of readings.

His platform has an open an API. See Headai.

Q&A


Q: Have you done controlled experiments?


A: Yes. We’ve found that people get 20-40% better performance when our software is used in blended model, i.e., with a human teacher. It helps motivate people if they can see the areas they need to work on disappear over time.


Q: The sw only found male authors in the example you put up of automatically collated materials.


A: Small training set. Gender is not part of the metadata in Finland.


A: Don’t you worry that your system will exacerbate bias?


Q: Humans are biased. AI is a black box. We need to think about how to manage this


Q: [me] Are the topics generated from the content? Or do you start off with an ontology?


A: It creates its ontology out of the data.


Q: [me] Are you committing to make sure that the results of your AI do not reflect the built in biases?


A: Our news system on the Web presents a range of views. We need to think about how to do this for gender issues with the course software.

Comments Off on [liveblog] Harri Ketamo on micro-learning

December 5, 2017

[liveblog] Conclusion of Workshop on Trustworthy Algorithmic Decision-Making

I’ve been at a two-day workshop sponsored by the Michigan State Uiversity and the National Science Foundation: “Workshop on Trustworthy Algorithmic Decision-Making.” After multiple rounds of rotating through workgroups iterating on five different questions, each group presented its findings — questions, insights, areas of future research.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Seriously, I cannot capture all of this.

Conduct of Data Science

What are the problems?

  • Who defines and how do we ensure good practice in data science and machine learning?

Why is the topic important? Because algorithms are important. And they have important real-world effects on people’s lives.

Why is the problem difficult?

  • Wrong incentives.

  • It can be difficult to generalize practices.

  • Best practices may be good for one goal but not another, e.g., efficiency but not social good. Also: Lack of shared concepts and vocabulary.

How to mitigate the problems?

  • Change incentives

  • Increase communication via vocabularies, translations

  • Education through MOOCS, meetups, professional organizations

  • Enable and encourage resource sharing: an open source lesson about bias, code sharing, data set sharing

Accountability group

The problem: How to integratively assess the impact of an algorithmic system on the public good? “Integrative” = the impact may be positive and negative and affect systems in complex ways. The impacts may be distributed differently across a population, so you have to think about disparities. These impacts may well change over time

We aim to encourage work that is:

  • Aspirationally casual: measuring outcomes causally but not always through randomized control trials.

  • The goal is not to shut down algorithms to to make positive contributions that generat solutions.

This is a difficult problem because:

  • Lack of variation in accountability, enforcements, and interventions.

  • It’s unclear what outcomes should be measure and how. This is context-dependent

  • It’s unclear which interventions are the highest priority

Why progress is possible: There’s a lot of good activity in this space. And it’s early in the topic so there’s an ability to significantly influence the field.

What are the barriers for success?

  • Incomplete understanding of contexts. So, think it in terms of socio-cultural approaches, and make it interdisciplinary.

  • The topic lies between disciplines. So, develop a common language.

  • High-level triangulation is difficult. Examine the issues at multiple scales, multiple levels of abstraction. Where you assess accountability may vary depending on what level/aspect you’re looking at.

Handling Uncertainty

The problem: How might we holistically treat and attribute uncertainty through data analysis and decisions systems. Uncertainty exists everywhere in these systems, so we need to consider how it moves through a system. This runs from choosing data sources to presenting results to decision-makers and people impacted by these results, and beyond that its incorporation into risk analysis and contingency planning. It’s always good to know where the uncertainty is coming from so you can address it.

Why difficult:

  • Uncertainty arises from many places

  • Recognizing and addressing uncertainties is a cyclical process

  • End users are bad at evaluating uncertain info and incorporating uncertainty in their thinking.

  • Many existing solutions are too computationally expensive to run on large data sets

Progress is possible:

  • We have sampling-based solutions that provide a framework.

  • Some app communities are recognizing that ignoring uncertainty is reducing the quality of their work

How to evaluate and recognize success?

  • A/B testing can show that decision making is better after incorporating uncertainty into analysis

  • Statistical/mathematical analysis

Barriers to success

  • Cognition: Train users.

  • It may be difficult to break this problem into small pieces and solve them individually

  • Gaps in theory: many of the problems cannot currently be solved algorithmically.

The presentation ends with a note: “In some cases, uncertainty is a useful tool.” E.g., it can make the system harder to game.

Adversaries, workarounds, and feedback loops

Adversarial examples: add a perturbation to a sample and it disrupts the classification. An adversary tries to find those perturbations to wreck your model. Sometimes this is used not to hack the system so much as to prevent the system from, for example, recognizing your face during a protest.

Feedback loops: A recidivism prediction system says you’re likely to commit further crimes, which sends you to prison, which increases the likelihood that you’ll commit further crimes.

What is the problem: How should a trustworthy algorithm account for adversaries, workarounds, and feedback loops?

Who are the stakeholders?

System designers, users, non-users, and perhaps adversaries.

Why is this a difficult problem?

  • It’s hard to define the boundaries of the system

  • From whose vantage point do we define adversarial behavior, workarounds, and feedback loops.

Unsolved problems

  • How do we reason about the incentives users and non-users have when interacting with systems in unintended ways.

  • How do we think about oversight and revision in algorithms with respect to feedback mechanisms

  • How do we monitor changes, assess anomalies, and implement safeguards?

  • How do we account for stakeholders while preserving rights?

How to recognize progress?

  • Mathematical model of how people use the system

  • Define goals

  • Find stable metrics and monitor them closely

  • Proximal metrics. Causality?

  • Establish methodologies and see them used

  • See a taxonomy of adversarial behavior used in practice

Likely approaches

  • Security methodology to anticipating and unintended behaviors and adversarial interactions’. Monitor and measure

  • Record and taxonomize adversarial behavior in different domains

  • Test . Try to break things.

Barriers

  • Hard to anticipate unanticipated behavior

  • Hard to define the problem in particular cases.

  • Goodhardt’s Law

  • Systems are born brittle

  • What constitutes adversarial behavior vs. a workaround is subjective.

  • Dynamic problem

Algorithms and trust

How do you define and operationalize trust.

The problem: What are the processes through which different stakeholders come to trust an algorithm?

Multiple processes lead to trust.

  • Procedural vs. substantive trust: are you looking at the weights of the algorithms (e.g.), or what were the steps to get you there?

  • Social vs personal: did you see the algorithm at work, or are you relying on peers?

These pathways are not necessarily predictive of each other.

Stakeholders build truth through multiple lenses and priorities

  • the builders of the algorithms

  • the people who are affected

  • those who oversee the outcomes

Mini case study: a child services agency that does not want to be identified. [All of the following is 100% subject to my injection of errors.]

  • The agency uses a predictive algorithm. The stakeholders range from the children needing a family, to NYers as a whole. The agency knew what into the model. “We didn’t buy our algorithm from a black-box vendor.” They trusted the algorithm because they staffed a technical team who had credentials and had experience with ethics…and who they trusted intuitively as good people. Few of these are the quantitative metrics that devs spend their time on. Note that FAT (fairness, accountability, transparency) metrics were not what led to trust.

Temporality:

  • Processes that build trust happen over time.

  • Trust can change or maybe be repaired over time. “

  • The timescales to build social trust are outside the scope of traditional experiments,” although you can perhaps find natural experiments.

Barriers:

  • Assumption of reducibility or transfer from subcomponents

  • Access to internal stakeholders for interviews and process understanding

  • Some elements are very long term

 


 

What’s next for this workshop

We generated a lot of scribbles, post-it notes, flip charts, Slack conversations, slide decks, etc. They’re going to put together a whitepaper that goes through the major issues, organizing them, and tries to capture the complexity while helping to make sense of it.

There are weak or no incentives to set appropriate levels of trust

Key takeways:

  • Trust is irreducible to FAT metrics alone

  • Trust is built over time and should be defined in terms of the temporal process

  • Isolating the algorithm as an instantiation misses the socio-technical factors in trust.

Comments Off on [liveblog] Conclusion of Workshop on Trustworthy Algorithmic Decision-Making

December 4, 2017

Workshop: Trustworthy Algorithmic Decision-Making

I’m at a two-day inter-disciplinary workshop on “Trustworthy Algorithmic Decision-Making” put on by the National Science Foundation and Michigan State University. The 2-page whitepapers
from the participants are online. (Here’s mine.) I may do some live-blogging of the workshops.

Goals:

– Key problems and critical qustionos?

– What to tell pol;icy-makers and others about the impact of these systems?

– Product approaches?

– What ideas, people, training, infrastructure are needed for these approaches?

Excellent diversity of backgrounds: CS, policy, law, library science, a philosopher, more. Good diversity in gender and race. As the least qualified person here, I’m greatly looking forward to the conversations.

Comments Off on Workshop: Trustworthy Algorithmic Decision-Making

December 2, 2017

[liveblog] Doaa Abu-Elyounes on "Bail or Jail? Judicial vs. Algorithmic decision making"

I’m at a weekly AI talk put on by Harvard’s Berkman Klein Center for Internet & Society and the MIT Media Lab. Doaa Abu-Elyounes is giving a talk called “Bail or Jail? Judicial vs. Algorithmic decision making”.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Doaa tells us that this talk is a work in progress.

We’ve all heard now about AI-based algorithms that are being used to do risk assessments in pretrial bail decisions. She thinks this is a good place to start using algorithms, although it’s not easy.

The pre-trial stage is supposed to be very short. The court has to determine if the defendant, presumed innocent, will be released on bail or jailed. The sole considerations are supposed to be whether the def is likely to harm someone else or flee. Preventive detention has many efffects, mostly negative for the defendant.
(The US is a world leader in pre-trial detainees. Yay?)

Risk assessment tools have been used for more than 50 years. Actuarial tools have shown greater predictive power than clinical judgment, and can eliminate some of the discretionary powers of judges. Use of these tools have long been controversy What type of factors to include in the power? Is the use of demographic factors to make predictions fair to individuals?

Existing tools use regression analysis. Now machine learning can learn from much more data. Mechanical predictions [= machine learning] are more accurate than statistical predictions, but may not be explicable.

We think humans can explain their decisions and we want machines to be able to as well. But look at movie reviews. Humans can tell if a review is positive. We can teach which words are positive or negative, getting 60% accuracy. Or we can have a human label the reviews as positive or negative and let the machine figure out what the factor are — via machine leaning — in which case we get 80% accuracy but may lose explicability.

With pretrial situations, what is the automated task is that the machine should be performing?

There’s a tension between accuracy and fairness. Computer scientists are trying to quantify these questions What does a fair algorithm look like? John Kleinberg and colleagues did a study of this [this one?]. Their algorithms reduced violent crime by 25% with no change in jailing rates, without increasing racial disparities. In short, the algorithm seems to have done a more accurate job with less bias.

Doaa lists four assessment tools she will be looking at: the Pretrial Risk Assessment [this one?], the Public Safety Assessment, the Virginia Pretrial Risk assessment Instrument and the Colorado Pretrial Assessment Tool.

Doaa goes through questions that should be asked of these tools, beginning with: Which factors are considered in each? [She dives into the details for all four tools. I can’t capture it. Sorry.]

What are the sources of data? (3 out of 4 rely on interviews and databases.)

What is the quality of the data? “This is the biggest problem jurisdictions are dealing with when using such a tool.” “Criminal justice data is notoriously poor.” And, of course, if a machine learning system is trained on discriminatory data, its conclusions are likely to reflect those biases.

The tools neeed to be periodically validated using data from its own district’s population. Local data matters.

There should be separate scores for flight risk and public safety All but the PSA provide only a single score. This is important because there are separate remedies for the two concerns. E.g., you might want to lock up someone who is a risk to public safety, but take away the passport of someone who is a flight risk.

Finally, the systems should discriminate among reasons for flight risk. E.g., because the defendant can’t afford the cost of making it to court or because she’s fleeing?

Conclusion: Pretrial is the front door of the criminal justice system and affects what happens thereafter. Risk assessment tools should not replace judges, but they bring benefits. They should be used, and should be made as transparent as possible. There are trade offs. The tool will not eliminate all bias but might help reduce it.

Q&A

Q: Do the algorithms recognize the different situations of different defendants?

A: Systems do recognize this, but not in sophisticated ways. That’s why it’s important to understand why a defendant might be at risk of missing a court date. Maybe we could provide poor defendants with a Metro card.

Q: Could machine learning be used to help us be more specific in the types of harm? What legal theories might we drawn on to help with this?

A: [The discussion got too detailed for me to follow. Sorry.]

Q: There are different definitions of recidivism. What do we do when there’s a mismatch between the machines and the court?

A: Some states give different weights to different factors based on how long ago the prior crimes were committed. I haven’t seen any difference in considering how far ahead the risk of a possible next crime is.

Q: [me] While I’m very sympathetic to allowing machine learning to be used without always requiring that the output be explicable, when it comes to the justice system, do we need explanations so not only is justice done, but we can have trust that it’s being done?

A: If we can say which factors are going into a decision — and it’s not a lot of them — if the accuracy rate is much higher than manual systems, then maybe we can give up on always being able to explain exactly how it came to its decisions. Remember, pre-trial procedures are short and there’s usually not a lot of explaining going on anyway. It’s unlikely that defendants are going to argue over the factors used.

Q: [me] Yes, but what about the defendant who feels that she’s being treated differently than some other person and wants to know why?

A: Judges generally don’t explain how they came to their decisions anyway. The law sets some general rules, and the comparisons between individuals is generally within the framework of those rules. The rules don’t promise to produce perfectly comparable results. In fact, you probably can’t easily find two people with such similar circumstances. There are no identical cases.

Q: Machine learning, multilevel regression level, and human decision making all weigh data and produce an outcome. But ML has little human interaction, statistical analysis has some, and the human decision is all human. Yet all are in fact algorithmic: the judge looks at a bond schedule to set bail. Predictability as fairness is exacerbated by the human decisions since the human cannot explain her model.

Q: Did you find any logic about why jurisdictions picked which tool? Any clear process for this?

A: It’s hard to get that information about the procurement process. Usually they use consultants and experts. There’s no study I know of that looks at this.

Q: In NZ, the main tool used for risk assessment for domestic violence is a Canadian tool called ODARA. Do tools work across jurisdictions? How do you reconcile data sets that might be quite different?

A: I’m not against using the same system across jurisdictions — it’s very expensive to develop one from scratch — but they need to be validated. The federal tool has not been, as far as I know. (It was created in 2009.) Some tools do better at this than others.

Q: What advice would you give to a jurisdiction that might want to procure one? What choices did the tools make in terms of what they’re optimized for? Also: What about COMPAS?

A: (I didn’t talk about COMPAS because it’s notorious and not often used in pre-trial, although it started out as a pre-trial tool.) The trade off seems to be between accuracy and fairness. Policy makers should define more strictly where the line should be drawn.

Q: Who builds these products?

A: Three out of the four were built in house.

Q: PSA was developed by a consultant hired by the Arnold Foundation. (She’s from Luminosity.) She has helped develop a number of the tools.

Q: Why did you decide to research this? What’s next?

A: I started here because pre-trial is the beginning of the process. I’m interested in the fairness question, among other things.

Q: To what extent are the 100+ factors that the Colorado tool considers available publicly? Is their rationale for excluding factors public? Because they’re proxies for race? Because they’re hard to get? Or because back then 100+ seemed like too many? And what’s the overlap in factors between the existing systems and the system Kleinberg used?

A: Interviewing defendants takes time, so 100 factors can be too much. Kleinberg only looked at three factors. Another tool relied on six factors.

Q: Should we require private companies to reveal their algorithms?

A: There are various models. One is to create an FDA for algorithms. I’m not sure I support that model. I think private companies need to expose at least to the govt the factors that they’re including. Others would say I’m too optimistic about the government.

Q: In China we don’t have the pre-trial part, but there’s an article saying that they can make the sentencing more fair by distinguishing among crimes. Also, in China the system is more uniform so the data can be aggregated and the system can be made more accurate.

A: Yes, states are different because they have different laws. Exchanging data between states is not very common and may not even be possible.

Comments Off on [liveblog] Doaa Abu-Elyounes on "Bail or Jail? Judicial vs. Algorithmic decision making"

October 28, 2017

Making medical devices interoperable

The screen next to a patient’s hospital bed that displays the heart rate, oxygen level, and other moving charts is the definition of a dumb display. How dumb is it, you ask? If the clip on a patient’s finger falls off, the display thinks the patient is no longer breathing and will sound an alarm…even though it’s displaying outputs from other sensors that show that, no, the patient isn’t about to die.

The problem, as explained by David Arney at an open house for MD PnP, is that medical devices do not share their data in open ways. That is, they don’t interoperate. MD PnP wants to fix that.

The small group was founded in 2004 as part of MIT’s CIMIT (Consortia for Improving Medicine with Innovation and Technology). Funded by grants, including from the NIH and CRICO Insurance, it currently has 6-8 people working on ways to improve health care by getting machines talking with one another.

The one aspect of hospital devices that manufacturers have generally agreed on is that they connect via serial ports. The FDA encourages this, at least in part because serial ports are electrically safe. So, David pointed to a small connector box with serial ports in and out and a small computer in between. The computer converts the incoming information into an open industry standard (ISO 11073). And now the devices can play together. (The “PnP” in the group’s name stands for “plug ‘n’ play,” as we used to say in the personal computing world.)

David then demonstrated what can be done once the data from multiple devices interoperate.

  • You can put some logic behind the multiple signals so that a patient’s actual condition can be assessed far more accurately: no more sirens when an oxygen sensor falls off a finger.

  • You can create displays that are more informative and easier to read — and easier to spot anomalies on — than the standard bedside monitor.

  • You can transform data into other standards, such as the HL7 format for entry into electronic medical records.

  • If there is more than one sensor monitoring a factor, you can do automatic validation of signals.

  • You can record and perhaps share alarm histories.

  • You can create what is functionally an API for the data your medical center is generating: a database that makes the information available to programs that need it via publish and subscribe.

  • You can aggregate tons of data (while following privacy protocols, of course) and use machine learning to look for unexpected correlations.

MD PnP makes its stuff available under an open BSD license and publishes its projects on GitHub. This means, for example, that while PnP has created interfaces for 20-25 protocols and data standards used by device makers, you could program its connector to support another device if you need to.

Presumably not all the device manufacturers are thrilled about this. The big ones like to sell entire suites of devices to hospitals on the grounds that all those devices interoperate amongst themselves — what I like to call intraoperating. But beyond corporate greed, it’s hard to find a down side to enabling more market choice and more data integration.

Comments Off on Making medical devices interoperable

Next Page »