logo
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

December 12, 2022

The Social Construction of Facts

To say that facts are social constructions doesn’t mean everything put forward as a fact is a fact. Nor does it mean that facts don’t express truths or facts are not to be trusted. Nor does it mean that there’s some unconstructed fact behind facts. Social constructionists don’t want to leave us in a world in which it’s ok to sat “No, it’s not raining” in the middle of a storm or claim “Water boiled at 40C for me this morning under normal circumstances.” 

Rather the critique, as I understand it, is that the fact-based disciplines we choose to pursue, the roles they play, who gets to participate, the forms of discourse and of proof, the equipment invented and the ways the materials are handled (the late Bruno Latour was brilliant on this point, among others), the commitment to an objective and consistent methodology (see Paul Feyerabend), all are the result of history, culture, economics, and social forces. Science itself is a social construct (as per Thomas Kuhn‘s The Structure of Scientific Revolutions [me on that book]). (Added bonus: Here’s Richard Rorty’s review of Ian Hacking’s excellent book, The Social Construction of What?)

Facts as facts pretty clearly seem (to me) to be social constructions. As such, they have a history…

Facts as we understand them became a thing in western culture when Francis Bacon early in the 17th century started explicitly using them to ground theories, which was a different way of constructing scientific truths; prior to this, science was built on deductions, not facts. (Pardon my generalizations.)

You can see the movement from deductive truth to fact-based empirical evidence across the many editions of Thomas Malthus‘ 1798 book,  An Essay on the Principle of Population, that predicted global famine based on a mathematical formula, but then became filled with facts and research from around the world. It went from a slim deductive volume to six volumes thick with facts and stats. Social construction added pounds to his Malthus’ work.

This happened because statistics arrived in Britain, by way of Germany, in the early 19th century. Statistical facts became important at that time not only because they enabled the inductive grounding of theories (as per Bacon and Malthus), but because they could rebut people’s personal interests. In particular,  they became an important way to break the sort of class-based assumptions that made it seem to t be ok to clean rich people’s chimneys by shoving little boys up them.  Against this were posed facts that showed that it was in fact bad for them. 

Compiling “blue books” of fact-based research became a standard part of the legislative process in England in the first half of the 19th century. By mid-century, the use of facts was so prevalent that in 1854 Dickens bemoaned society’s reliance on them in Hard Times on the grounds that facts kill imagination…yet another opposite to facts, and another social construction.

As the 19th century ended, we got our first fact-finding commissions that were established in order to peacefully resolve international disputes. (Narrator: They rarely did.) This was again using facts as the boulder that stubs your toe of self-interest (please forget I ever wrote that phrase), but now those interests were cross-national and not as easily resolvable as when you poise the interests of lace-cuffed lords against the interests of children crawling through upperclass chimneys.

In the following century  we got (i.e., we constructed) an idea of science and human knowledge that focused on assembling facts as if they were bricks out of which one could build a firm foundation. This led to some moaning (in a famous 1963 letter to the editor)  that science was turning into a mere “brickyard” of unassembled facts.

I’m not a historian, and this is the best I can recall from a rabbit hole of specific curiosity I fell into about 10 yrs ago when writing Too Big to Know. But the point is the the idea of the social construction of science and facts doesn’t mean that all facts — including “alternative facts” — are equal.  Water really does boil at 100C. Rather it’s the idea, role, use, importance, and control of facts that’s socially constructed.

Tweet
Follow me

Categories: philosophy, science, too big to know Tagged with: 2b2k • science Date: December 12th, 2022 dw

2 Comments »

December 21, 2018

“I know tech better than anyone” isn’t a lie

The Democrats are trying to belittle the concept of a Wall, calling it old fashioned. The fact is there is nothing else’s that will work, and that has been true for thousands of years. It’s like the wheel, there is nothing better. I know tech better than anyone, & technology…..

— Donald J. Trump (@realDonaldTrump) December 21, 2018

This comes from a man who does not know how to close an umbrella.

Does Trump really believe that he knows more about tech than anyone? Even if we take away the hyperbole, does he think he’s an expert at technology? What could he mean by that? That he knows how to build a computer? What an Internet router does? That he can explain what an adversarial neural network is, or just the difference between machine learning and deep learning? That he can provide IT support when Jared can’t find the song he just downloaded to his iPhone? That he can program his VCR?

But I don’t think he means any of those things by his ridiculous claim.

I think it’s worse than that. The phrase is clearly intended to have an effect, not to mean anything. “Listen to me. Believe me.” is an assertion of authority intended to forestall questioning. A genuine expert might say something like that, and at least sometimes it’d be reasonable and acceptable; it’s also sometimes obnoxious. Either way, “I know more about x than anyone” is a conversational tool.

So, Trump has picked up a hammer. His hand is clasped around its handle. He swings his arm and brings the hammer squarely down on the nail. He hears the bang. He has wielded this hammer successfully.

Except the rest of us can see there is nothing — nothing — in his hand. We all know that. Only he does not.

Trump is not lying. He is insane.

Tweet
Follow me

Categories: politics, too big to know Tagged with: 2b2k • politics • trump Date: December 21st, 2018 dw

8 Comments »

September 20, 2018

Coming to belief

I’ve written before about the need to teach The Kids (also: all of us) not only how to think critically so we can see what we should not believe, but also how to come to belief. That piece, which I now cannot locate, was prompted by danah boyd’s excellent post on the problem with media literacy. Robert Berkman, Outreach, Business Librarian at the University of Rochester and Editor of The Information Advisor’s Guide to Internet Research, asked me how one can go about teaching people how to come to belief. Here’s an edited version of my reply:

I’m afraid I don’t have a good answer. I actually haven’t thought much about how to teach people how to come to belief, beyond arguing for doing this as a social process (the ol’ “knowledge is a network” argument :) I have a pretty good sense of how *not* to do it: the way philosophy teachers relentlessly show how every proposed position can be torn down.

I wonder what we’d learn by taking a literature course as a model — not one that is concerned primarily with critical method, but one that is trying to teach students how to appreciate literature. Or art. The teacher tries to get the students to engage with one another to find what’s worthwhile in a work. Formally, you implicitly teach the value of consistency, elegance of explanation, internal coherence, how well a work clarifies one’s own experience, etc. Those are useful touchstones for coming to belief.

I wouldn’t want to leave students feeling that it’s up to them to come up with an understanding on their own. I’d want them to value the history of interpretation, bringing their critical skills to it. The last thing we need is to make people feel yet more unmoored.

I’m also fond of the orthodox Jewish way of coming to belief, as I, as a non-observant Jew, understand it. You have an unchanging and inerrant text that means nothing until humans interpret it. To interpret it means to be conversant with the scholarly opinions of the great Rabbis, who disagree with one another, often diametrically. Formulating a belief in this context means bringing contemporary intelligence to a question while finding support in the old Rabbis…and always always talking respectfully about those other old Rabbis who disagree with your interpretation. No interpretations are final. Learned contradiction is embraced.

That process has the elements I personally like (being moored to a tradition, respecting those with whom one disagrees, acceptance of the finitude of beliefs, acceptance that they result from a social process), but it’s not going to be very practical outside of Jewish communities if only because it rests on the acceptance of a sacred document, even though it’s one that literally cannot be taken literally; it always requires interpretation.

My point: We do have traditions that aim at enabling us to come to belief. Science is one of them. But there are others. We should learn from them.

TL;DR: I dunno.

Tweet
Follow me

Categories: philosophy, too big to know Tagged with: 2b2k • fake news • logic • philosophy Date: September 20th, 2018 dw

2 Comments »

July 31, 2018

[2b2k] Errata: Wrong about Wycliffe

I received this about Too Big to Know from Isaiah Hoogendyk, Biblical Data Engineer at Faithlife Corporation:

In chapter 9, “Building the New Infrastructure of Knowledge,” (sorry, don’t have a page number: read this in the Kindle app) you state:

“There was a time when we thought we were doing the common folk a favor by keeping the important knowledge out of their reach. That’s why the Pope called John Wycliffe a heretic in the fourteenth century for creating the first English-language translation of the Christian Bible.”

This is quite false, actually. There was in fact nothing heretical about translating the Scriptures into the vernacular; instead, Wycliffe was condemned for a multitude of heresies regarding rejection of Catholic belief on the Sacraments and the priesthood, among other things. Some of these beliefs were interpolated into the translation of the Scriptures attributed to him (which weren’t even entirely translated by him), but it was mostly his other writings that were censured by the Pope. You can read more about that here: https://plato.stanford.edu/archives/win2011/entries/wyclif/.

Thanks, Isaiah.

Tweet
Follow me

Categories: too big to know Tagged with: 2b2k • errata Date: July 31st, 2018 dw

Be the first to comment »

May 16, 2018

[liveblog] Aubrey de Grey

I’m at the CUBE Tech conference in Berlin. (I’m going to give a first keynote on the book I’m finishing.) Aubrey de Grey begins his keynote begins by changing the question from “Who wants to get old?” to “Who wants Alzheimers?” because we’ve been brainwashed into thinking that aging is somehow good for us: we get wiser, get to retire, etc. Now we are developing treatments for aging. Ambiguity about aging is now “hugely damaging” because it hinders the support of research. E.g., his SENS Research Foundation is going too slowly because of funding restraints.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

“The defeat of aging via medicine is foreseseeable now.” He says he has to be credible because people have been saying this forever and have been wrong.

“Why is aging still a problem?” One hundred years ago, a third of babies would die before they were one year old. We fixed this in the industrialized world through simple advances, e.g., hygiene, mosquito, antibiotics. So why are diseases of old age so much harder to control? People think it’s because so many things go wrong with us late in life, interacting with one another and creating incredible complexity. But that’s not the main answer.

“Aging is easy to define: it is a side effect of being alive.” “It’s a fact of the operation of the human body generates damage.” It accumulates. The body tolerates a certain amount. When you pass that amount, you get pathologies of old age. Our approach has been to develop geriatric medicine to counteract those pathologies. That’s where most of the research goes.

aubrey de gray metabolism diagram

“Metabolism: The ultimate undocumented spaghetti code”

But that won’t work because the damage continues. Geriatric medicine bangs away at the pathologies, but will necessarily become less effective over time. “We make this mistake because of a misclassification we make.”

If you ask people to make categories of disease, they’ll come up with communicable, congenital, and chronic. Then most people add a fourth way of being sick: aging itself. It includes fraility, sarcopenia (loss of muscle), immunosenesence (aging of the immune system)…But that’s silly. Aging in a living organism is the same as aging in a machine. “Aging is the accumulation of damage that occurs as a side-effect of the body’s normal operation.”It is the accumulation of damage to the body that occurs as an intrinsic side-effect of the body’s normal operation. That means the categories are right, except aging covers column 3 and 4. Column 3 — specific diseases such as alzheimer’s and cancer — is also part aging. This means that aging isn’t a blessing in surprise, and that we can’t say that column 3 are high-priorities of medicine but those in 4 are not.

A hundred years ago a few people started to think about this and realized that if we tried to interfere with the process of aging earlier one, we’d do better. This became the field of gerontology. Some species age much more slowly than others. Maybe we can figure out the basis for that variation. But the metabolism is really really complicated. “This is the ultimate nightmare of uncommented spaghetti code.” We know so little about how the body works.

“There is another approach. And it’s completely bleeding obvious”: Periodically repair the damage. We don’t need to slow down the rate at which metabolism causes damage. We need to engineer a system we don’t understand. But “we don’t need to understand how metabolism causes damag”we don’t need to understand how metabolism causes damage. Nor do we need to know what to do when the damage is too great, because we’re not going to let it get to that state. We do this with, say, antique cars. Preventitive maintenance works. “The only question is, can we do it for a much more complicated machine like the human body?

“We’re sidestepping our ignorance of metabolism and pathology. But we have to cope with the fact that damage is complicated” All of the types of damage, from cell loss toe extracellular matrix stiffening — there are 7 categories — can be repaired through a single approach: genetic repair. E.g., loss of cells can be repaired by replacing them using stem cells. Unfortunately, most of the funding is going only to this first category. SENS was created to enable research on the other seven. Aubrey talks about SENS’ work on protecting cells from the bad effects of cholesterol.

He points to another group (unnamed) that has reinvented this approach and is getting a lot of notice.

He says longevity is not what people think it is. These therapies will let people stay alive longer, but they will also stay youthful longer. “”Longevity is a side effect of health.” ”“Longevity is a side effect of health.”

Will this be only for the rich? Overpopulation? Boredom? Pensions collapse? We’re taking care of overpopulation by cleaning up its effects, he says. He says there are solutions to these problems. But there are choices we have to make. No one wants to get Alzheimers. We can’t have it both ways. Either we want to keep people healthy or not.

He says SENS has been successful enough that they’ve been able to spin out some of the research into commercial operations. But we need to cary on in the non-profit research world as well. Project 21 aims at human rejuvenation clinical trials.

Tweet
Follow me

Categories: culture, liveblog Tagged with: 2b2k • aging Date: May 16th, 2018 dw

4 Comments »

February 11, 2018

The story of lead and crime, told in tweets

Patrick Sharkey [twitter: patrick_sharkey] uses a Twitter thread to evaluate the evidence about a possible relationship between exposure to lead and crime. The thread is a bit hard to get unspooled correctly, but it’s worth it as an example of:

1. Thinking carefully about complex evidence and data.

2. How Twitter affects the reasoning and its expression.

3. The complexity of data, which will only get worse (= better) as machine learning can scale up their size and complexity.

Note: I lack the skills and knowledge to evaluate Patrick’s reasoning. And, hat tip to David Lazer for the retweet of the thread.

Tweet
Follow me

Categories: ai, science Tagged with: 2b2k • ai • complexity • machine learning Date: February 11th, 2018 dw

Be the first to comment »

The brain is not a computer and the world is not information

Robert Epstein argues in Aeon against the dominant assumption that the brain is a computer, that it processes information, stores and retrieves memories, etc. That we assume so comes from what I think of as the informationalizing of everything.

The strongest part of his argument is that computers operate on symbolic information, but brains do not. There is no evidence (that I know of, but I’m no expert. On anything) that the brain decomposes visual images into pixels and those pixels into on-offs in a code that represents colors.

In the second half, Epstein tries to prove that the brain isn’t a computer through some simple experiments, such as drawing a dollar bill from memory and while looking at it. Someone committed to the idea that the brain is a computer would probably just conclude that the brain just isn’t a very good computer. But judge for yourself. There’s more to it than I’m presenting here.

Back to Epstein’s first point…

It is of the essence of information that it is independent of its medium: you can encode it into voltage levels of transistors, magnetized dust on tape, or holes in punch cards, and it’s the same information. Therefore, a representation of a brain’s states in another medium should also be conscious. Epstein doesn’t make the following argument, but I will (and I believe I am cribbing it from someone else but I don’t remember who).

Because information is independent of its medium, we could encode it in dust particles swirling clockwise or counter-clockwise; clockwise is an on, and counter is an off. In fact, imagine there’s a dust cloud somewhere in the universe that has 86 billion motes, the number of neurons in the human brain. Imagine the direction of those motes exactly matches the on-offs of your neurons when you first spied the love of your life across the room. Imagine those spins shift but happen to match how your neural states shifted over the next ten seconds of your life. That dust cloud is thus perfectly representing the informational state of your brain as you fell in love. It is therefore experiencing your feelings and thinking your thoughts.

That by itself is absurd. But perhaps you say it is just hard to imagine. Ok, then let’s change it. Same dust cloud. Same spins. But this time we say that clockwise is an off, and the other is an on. Now that dust cloud no longer represents your brain states. It therefore is both experiencing your thoughts and feeling and is not experiencing them at the same time. Aristotle would tell us that that is logically impossible: a thing cannot simultaneously be something and its opposite.

Anyway…

Toward the end of the article, Epstein gets to a crucial point that I was very glad to see him bring up: Thinking is not a brain activity, but the activity of a body engaged in the world. (He cites Anthony Chemero’s Radical Embodied Cognitive Science (2009) which I have not read. I’d trace it back further to Andy Clark, David Chalmers, Eleanor Rosch, Heidegger…). Reducing it to a brain function, and further stripping the brain of its materiality to focus on its “processing” of “information” is reductive without being clarifying.

I came into this debate many years ago already made skeptical of the most recent claims about the causes of consciousness by having some awareness of the series of failed metaphors we have used over the past couple of thousands of years. Epstein puts this well, citing another book I have not read (and another book I’ve consequently just ordered):

In his book In Our Own Image (2015), the artificial intelligence expert George Zarkadakis describes six different metaphors people have employed over the past 2,000 years to try to explain human intelligence.

In the earliest one, eventually preserved in the Bible, humans were formed from clay or dirt, which an intelligent god then infused with its spirit. That spirit ‘explained’ our intelligence – grammatically, at least.

The invention of hydraulic engineering in the 3rd century BCE led to the popularity of a hydraulic model of human intelligence, the idea that the flow of different fluids in the body – the ‘humours’ – accounted for both our physical and mental functioning. The hydraulic metaphor persisted for more than 1,600 years, handicapping medical practice all the while.

By the 1500s, automata powered by springs and gears had been devised, eventually inspiring leading thinkers such as René Descartes to assert that humans are complex machines. In the 1600s, the British philosopher Thomas Hobbes suggested that thinking arose from small mechanical motions in the brain. By the 1700s, discoveries about electricity and chemistry led to new theories of human intelligence – again, largely metaphorical in nature. In the mid-1800s, inspired by recent advances in communications, the German physicist Hermann von Helmholtz compared the brain to a telegraph.

Maybe this time our tech-based metaphor has happened to get it right. But history says we should assume not. We should be very alert to the disanologies, which Epstein helps us with.

Getting this right, or at least not getting it wrong, matters. The most pressing problem with the informationalizing of thought is not that it applies a metaphor, or even that the metaphor is inapt. Rather it’s that this metaphor leads us to a seriously diminished understanding of what it means to be a living, caring creature.

I think.

 

Hat tip to @JenniferSertl for pointing out the Aeon article.

Tweet
Follow me

Categories: ai, infohistory, philosophy Tagged with: 2b2k • ai • consciousness • information Date: February 11th, 2018 dw

Be the first to comment »

February 1, 2018

Can AI predict the odds on you leaving the hospital vertically?

A new research paper, published Jan. 24 with 34 co-authors and not peer-reviewed, claims better accuracy than existing software at predicting outcomes like whether a patient will die in the hospital, be discharged and readmitted, and their final diagnosis. To conduct the study, Google obtained de-identified data of 216,221 adults, with more than 46 billion data points between them. The data span 11 combined years at two hospitals,

That’s from an article in Quartz by Dave Gershgorn (Jan. 27, 2018), based on the original article by Google researchers posted at Arxiv.org.

…Google claims vast improvements over traditional models used today for predicting medical outcomes. Its biggest claim is the ability to predict patient deaths 24-48 hours before current methods, which could allow time for doctors to administer life-saving procedures.

Dave points to one of the biggest obstacles to this sort of computing: the data are in such different formats, from hand-written notes to the various form-based data that’s collected. It’s all about the magic of interoperability … and the frustration when data (and services and ideas and language) can’t easily work together. Then there’s what Paul Edwards, in his great book A Vast Machine calls “data friction”: “…the costs in time, energy, and attention required simply to collect, check, store, move, receive, and access data.” (p. 84)

On the other hand, machine learning can sometimes get past the incompatible expression of data in a way that’s so brutal that it’s elegant. One of the earlier breakthroughs in machine learning came in the 1990s when IBM analyzed the English and French versions of Hansard, the bi-lingual transcripts of the Canadian Parliament. Without the machines knowing the first thing about either language, the system produced more accurate results than software that was fed rules of grammar, bilingual dictionaries, etc.

Indeed, the abstract of the Google paper says “Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation of patients’ entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. ” It continues: “We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization.”

The paper also says that their approach affords clinicians “some transparency into the predictions.” Some transparency is definitely better than none. But, as I’ve argued elsewhere, in many instances there may be tools other than transparency that can give us some assurance that AI’s outcomes accord with our aims and our principles of fairness.

 


 

I found this article by clicking on Dave Gershgon’s byline on a brief article about the Wired version of the paper of mine I referenced in the previous paragraph. He does a great job explaining it. And, believe me, it’s hard to get a writer — well, me, anyway — to acknowledge that without having to insert even one caveat. Thanks, Dave!

Tweet
Follow me

Categories: ai, interop Tagged with: 2b2k • data • explanations • interop • machine learning Date: February 1st, 2018 dw

Be the first to comment »

December 16, 2017

[liveblog] Harri Ketamo on micro-learning

I’m at the STEAM ed Finland conference in Jyväskylä. Harri Ketamo is giving a talk on “micro-learning.” He recently won a prestigious prize for the best new ideas in Finland. He is interested in the use of AI for learning.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

We don’t have enough good teachers globally, so we have to think about ed in new ways, Harri says. Can we use AI to bring good ed to everyone without hiring 200M new teachers globally? If we paid teachers equivalent to doctors and lawyers, we could hire those 200M. But we apparently not willing to do that.


One challenge: Career coaching. What do you want to study? Why? What are the skills you need? What do you need to know?


His company does natural language analysis — not word matches, but meaning. As an example he shows a shareholder agreement. Such agreements always have the same elements. After being trained on law, his company’s AI can create a map of the topic and analyze a block of text to see if it covers the legal requirements…the sort of work that a legal assistant does. For some standard agreements, we may soon not need lawyers, he predicts.


The system’s language model is a mess of words and relations. But if you zoom out from the map, the AI has clustered the concepts. At the Slush Sanghai conference, his AI could develop a list of the companies a customer might want to meet based on a text analysis of the companies’ web sites, etc. Likewise if your business is looking for help with a project.


Finland has a lot of public data about skills and openings. Universities’ curricula are publicly available.[Yay!] Unlike LinkedIn, all this data is public. Harri shows a map that displays the skills and competencies Finnish businesses want and the matching training offered by Finnish universities. The system can explore public information about a user and map that to available jobs and the training that is required and available for it. The available jobs are listed with relevancy expressed as a percentage. It can also look internationally to find matches.


The AI can also put together a course for a topic that a user needs. It can tell what the core concepts are by mining publications, courses, news, etc. The result is an interaction with a bot that talks with you in a Whatsapp like way. (See his paper “Agents and Analytics: A framework for educational data mining with games based learning”). It generates tests that show what a student needs to study if she gets a question wrong.


His newest project, in process: Libraries are the biggest collections of creative, educational material, so the AI ought to point people there. His software can find the common sources among courses and areas of study. It can discover the skills and competencies that materials can teach. This lets it cluster materials around degree programs. It can also generate micro-educational programs, curating a collection of readings.

His platform has an open an API. See Headai.

Q&A


Q: Have you done controlled experiments?


A: Yes. We’ve found that people get 20-40% better performance when our software is used in blended model, i.e., with a human teacher. It helps motivate people if they can see the areas they need to work on disappear over time.


Q: The sw only found male authors in the example you put up of automatically collated materials.


A: Small training set. Gender is not part of the metadata in Finland.


A: Don’t you worry that your system will exacerbate bias?


Q: Humans are biased. AI is a black box. We need to think about how to manage this


Q: [me] Are the topics generated from the content? Or do you start off with an ontology?


A: It creates its ontology out of the data.


Q: [me] Are you committing to make sure that the results of your AI do not reflect the built in biases?


A: Our news system on the Web presents a range of views. We need to think about how to do this for gender issues with the course software.

Tweet
Follow me

Categories: ai, education, liveblog, machine learning, too big to know Tagged with: 2b2k • ai • education • liveblog • machine learning Date: December 16th, 2017 dw

Be the first to comment »

December 5, 2017

[liveblog] Conclusion of Workshop on Trustworthy Algorithmic Decision-Making

I’ve been at a two-day workshop sponsored by the Michigan State Uiversity and the National Science Foundation: “Workshop on Trustworthy Algorithmic Decision-Making.” After multiple rounds of rotating through workgroups iterating on five different questions, each group presented its findings — questions, insights, areas of future research.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Seriously, I cannot capture all of this.

Conduct of Data Science

What are the problems?

  • Who defines and how do we ensure good practice in data science and machine learning?

Why is the topic important? Because algorithms are important. And they have important real-world effects on people’s lives.

Why is the problem difficult?

  • Wrong incentives.

  • It can be difficult to generalize practices.

  • Best practices may be good for one goal but not another, e.g., efficiency but not social good. Also: Lack of shared concepts and vocabulary.

How to mitigate the problems?

  • Change incentives

  • Increase communication via vocabularies, translations

  • Education through MOOCS, meetups, professional organizations

  • Enable and encourage resource sharing: an open source lesson about bias, code sharing, data set sharing

Accountability group

The problem: How to integratively assess the impact of an algorithmic system on the public good? “Integrative” = the impact may be positive and negative and affect systems in complex ways. The impacts may be distributed differently across a population, so you have to think about disparities. These impacts may well change over time

We aim to encourage work that is:

  • Aspirationally casual: measuring outcomes causally but not always through randomized control trials.

  • The goal is not to shut down algorithms to to make positive contributions that generat solutions.

This is a difficult problem because:

  • Lack of variation in accountability, enforcements, and interventions.

  • It’s unclear what outcomes should be measure and how. This is context-dependent

  • It’s unclear which interventions are the highest priority

Why progress is possible: There’s a lot of good activity in this space. And it’s early in the topic so there’s an ability to significantly influence the field.

What are the barriers for success?

  • Incomplete understanding of contexts. So, think it in terms of socio-cultural approaches, and make it interdisciplinary.

  • The topic lies between disciplines. So, develop a common language.

  • High-level triangulation is difficult. Examine the issues at multiple scales, multiple levels of abstraction. Where you assess accountability may vary depending on what level/aspect you’re looking at.

Handling Uncertainty

The problem: How might we holistically treat and attribute uncertainty through data analysis and decisions systems. Uncertainty exists everywhere in these systems, so we need to consider how it moves through a system. This runs from choosing data sources to presenting results to decision-makers and people impacted by these results, and beyond that its incorporation into risk analysis and contingency planning. It’s always good to know where the uncertainty is coming from so you can address it.

Why difficult:

  • Uncertainty arises from many places

  • Recognizing and addressing uncertainties is a cyclical process

  • End users are bad at evaluating uncertain info and incorporating uncertainty in their thinking.

  • Many existing solutions are too computationally expensive to run on large data sets

Progress is possible:

  • We have sampling-based solutions that provide a framework.

  • Some app communities are recognizing that ignoring uncertainty is reducing the quality of their work

How to evaluate and recognize success?

  • A/B testing can show that decision making is better after incorporating uncertainty into analysis

  • Statistical/mathematical analysis

Barriers to success

  • Cognition: Train users.

  • It may be difficult to break this problem into small pieces and solve them individually

  • Gaps in theory: many of the problems cannot currently be solved algorithmically.

The presentation ends with a note: “In some cases, uncertainty is a useful tool.” E.g., it can make the system harder to game.

Adversaries, workarounds, and feedback loops

Adversarial examples: add a perturbation to a sample and it disrupts the classification. An adversary tries to find those perturbations to wreck your model. Sometimes this is used not to hack the system so much as to prevent the system from, for example, recognizing your face during a protest.

Feedback loops: A recidivism prediction system says you’re likely to commit further crimes, which sends you to prison, which increases the likelihood that you’ll commit further crimes.

What is the problem: How should a trustworthy algorithm account for adversaries, workarounds, and feedback loops?

Who are the stakeholders?

System designers, users, non-users, and perhaps adversaries.

Why is this a difficult problem?

  • It’s hard to define the boundaries of the system

  • From whose vantage point do we define adversarial behavior, workarounds, and feedback loops.

Unsolved problems

  • How do we reason about the incentives users and non-users have when interacting with systems in unintended ways.

  • How do we think about oversight and revision in algorithms with respect to feedback mechanisms

  • How do we monitor changes, assess anomalies, and implement safeguards?

  • How do we account for stakeholders while preserving rights?

How to recognize progress?

  • Mathematical model of how people use the system

  • Define goals

  • Find stable metrics and monitor them closely

  • Proximal metrics. Causality?

  • Establish methodologies and see them used

  • See a taxonomy of adversarial behavior used in practice

Likely approaches

  • Security methodology to anticipating and unintended behaviors and adversarial interactions’. Monitor and measure

  • Record and taxonomize adversarial behavior in different domains

  • Test . Try to break things.

Barriers

  • Hard to anticipate unanticipated behavior

  • Hard to define the problem in particular cases.

  • Goodhardt’s Law

  • Systems are born brittle

  • What constitutes adversarial behavior vs. a workaround is subjective.

  • Dynamic problem

Algorithms and trust

How do you define and operationalize trust.

The problem: What are the processes through which different stakeholders come to trust an algorithm?

Multiple processes lead to trust.

  • Procedural vs. substantive trust: are you looking at the weights of the algorithms (e.g.), or what were the steps to get you there?

  • Social vs personal: did you see the algorithm at work, or are you relying on peers?

These pathways are not necessarily predictive of each other.

Stakeholders build truth through multiple lenses and priorities

  • the builders of the algorithms

  • the people who are affected

  • those who oversee the outcomes

Mini case study: a child services agency that does not want to be identified. [All of the following is 100% subject to my injection of errors.]

  • The agency uses a predictive algorithm. The stakeholders range from the children needing a family, to NYers as a whole. The agency knew what into the model. “We didn’t buy our algorithm from a black-box vendor.” They trusted the algorithm because they staffed a technical team who had credentials and had experience with ethics…and who they trusted intuitively as good people. Few of these are the quantitative metrics that devs spend their time on. Note that FAT (fairness, accountability, transparency) metrics were not what led to trust.

Temporality:

  • Processes that build trust happen over time.

  • Trust can change or maybe be repaired over time. “

  • The timescales to build social trust are outside the scope of traditional experiments,” although you can perhaps find natural experiments.

Barriers:

  • Assumption of reducibility or transfer from subcomponents

  • Access to internal stakeholders for interviews and process understanding

  • Some elements are very long term

 


 

What’s next for this workshop

We generated a lot of scribbles, post-it notes, flip charts, Slack conversations, slide decks, etc. They’re going to put together a whitepaper that goes through the major issues, organizing them, and tries to capture the complexity while helping to make sense of it.

There are weak or no incentives to set appropriate levels of trust

Key takeways:

  • Trust is irreducible to FAT metrics alone

  • Trust is built over time and should be defined in terms of the temporal process

  • Isolating the algorithm as an instantiation misses the socio-technical factors in trust.

Tweet
Follow me

Categories: ai, culture, liveblog, philosophy Tagged with: 2b2k • ai • governance • machine learning Date: December 5th, 2017 dw

Be the first to comment »

Next Page »


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
TL;DR: Share this post freely, but attribute it to me (name (David Weinberger) and link to it), and don't use it commercially without my permission.

Joho the Blog uses WordPress blogging software.
Thank you, WordPress!