Joho the Blogai Archives - Joho the Blog

September 14, 2018

Five types of AI fairness

Google PAIR (People + AI Research) has just posted my attempt to explain what fairness looks like when it’s operationalized for a machine learning system. It’s pegged around five “fairness buttons” on the new Google What-If tool, a resource for developers who want to try to figure out what factors (“features” in machine learning talk) are affecting an outcome.

Note that there are far more than five ways to operationalize fairness. The point of the article is that once we are forced to decide exactly what we’re going to count as fair — exactly enough that a machine learning system can implement it — we realize just how freaking complex fairness is. OMG. I broke my brain trying to figure out how to explain some of those ideas, and it took several Google developers (especially James Wexler) and a fine mist of vegetarian broth to restore it even incompletely. Even so, my explanations are less clear than I (or you, I’m sure) would like. But at least there’s no math in them :)

I’ll write more about this at some point, but for me the big take-away is that fairness has had value as a moral concept so far because it is vague enough to allow our intuition to guide us. Machine learning is going to force us to get very specific about it. But we are not yet adept enough at it — e.g., we don’t have a vocabulary for talking about the various varieties — plus we don’t agree about them enough to be able to navigate the shoals. It’s going to be a big mess, but something we have to work through. When we do, we’ll be better at being fair.

Now, about the fact that I am a writer-in-residence at Google. Well, yes I am, and have been for about 6 weeks. It’s a 6 month part-time experiment. My role is to try to explain some of machine learning to people who, like me, lack the technical competence to actually understand it. I’m also supposed to be reflecting in public on what the implications of machine learning might be on our ideas. I am expected to be an independent voice, an outsider on the inside.

So far, it’s been an amazing experience. I’m attached to PAIR, which has developers working on very interesting projects. They are, of course, super-smart, but they have not yet tired of me asking dumb questions that do not seem to be getting smarter over time. So, selfishly, it’s been great for me. And isn’t that all that really matters, hmmm?

Be the first to comment »

May 6, 2018

[liveblog][ai] Primavera De Filippi: An autonomous flower that merges AI and Blockchain

Primavera De Filippi is an expert in blockchain-based tech. She is giving a ThursdAI talk on Plantoid, an event held by Harvard’s Berkman Klein Center for Internet & Society and the MIT Media Lab. Her talk is officially on operational autonomy vs. decisional autonomy, but it’s really about how weird things become when you build a computerized flower that merges AI and the blockchain. For me, a central question of her talk was: Can we have autonomous robots that have legal rights and can own and spend assets, without having to resort to conferring personhood on them the way we have with corporations?

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Autonomy and liability

She begins by pointing to the 3 industrial revolutions so far: Steam led to mechanized production ; Electricity led to mass production; Electronics led to automated production. The fourth — AI — is automating knowledge production.

People are increasingly moving into the digital world, and digital systems are moving back into the physical worlds, creating cyber-physical systems. E.g., the Internet of Things senses, communicates, and acts. The Internet of Smart Things learns from the data the things collect, makes inferences, and then acts. The Internet of Autonomous Things creates new legal challenges. Various actors can be held liable: manufacturer, software developer, user, and a third party. “When do we apply legal personhood to non-humans?”

With autonomous things, the user and third parties become less liable as the software developer takes on more of the liability: There can be a bug. Someone can hack into it. The rules that make inferences are inaccurate. Or a bad moral choice has led the car into an accident.

The sw developer might have created bug-free sw but its interaction with other devices might lead to unpredictability; multiple systems operating according to different rules might be incompatible; it can be hard to identify the chain of causality. So, who will be liable? The manufacturers and owners are likely to have only limited liability.

So, maybe we’ll need generalized insurance: mandatory insurance that potentially harmful devices need to subscribe to.

Or, perhaps we will provide some form of legal personhood to machines so the manufacturers can be sued for their failings. Suing a robot would be like suing a corporation. The devices would be able to own property and assets. The EU is thinking about creating this type of agenthood for AI systems. This is obviously controversial. At least a corporation has people associated with it, while the device is just a device, Primavera points out.

So, when do we apply legal personhood to non-humans? In addition to people and corporations, some countries have assigned personhood to chimpanzees (Argentina, France) and to natural resources (NZ: Whanganui river). We do this so these entities will have rights and cannot be simply exploited.

If we give legal personhood to AI-based systems, can AI have property rights over their assets and IP? If they are legally liable, they can be held responsible for their actions, and can be sued for compensation? “Maybe they should have contractual rights so they can enter into contracts. Can they be rewarded for their work? Taxed?”Maybe they should have contractual rights so they can enter into contracts. Can they be rewarded for their work? Taxed? [All of these are going to turn out to be real questions. … Wait for it …]

Limitations: “Most of the AI-based systems deployed today are more akin to slaves than corporations.” They’re not autonomous the way people are. They are owned, controlled and maintained by people or corporations. They act as agents for their operators. They have no technical means to own or transfer assets. (Primavera recommends watching the Star Trek: The Next Generation episode “The Measure of the Man” that asks, among other things, whether Data (the android) can be dismantled and whether he can resign.)

Decisional autonomy is the capacity to make a decision on your own, but it doesn’t necessarily bring what we think of as real autonomy. E.g., an AV can decide its route. For real autonomy we need operational autonomy: no one is maintaining the thing’s operation at a technical level. To take a non-random example, a blockchain runs autonomously because there is no single operator controlling. E.g., smart contracts come with a guarantee of execution. Once a contract is registered with a blockchain, no operator can stop it. This is operational autonomy.

Blockchain meets AI. Object: Autonomy

We are getting first example of autonomous devices using blockchain. The most famous is the Samsung washing machine that can detect when the soap is empty, and makes a smart contract to order more. Autonomous cars could work with the same model; they could not be owned by anyone and collect money when someone uses them. These could be initially purchased by someone and then buy themselves off: “They’d have to be emancipated,” she says. Perhaps they and other robots can use the capital they accumulate to hire people to work for them. [Pretty interesting model for an Uber.]

She introduces Plantoid, a blockchain-based life form. “Plantoid is autonomous, self-sufficient, and can reproduce.”It’s autonomous, self-sufficient, and can reproduce. Real flowers use bees to reproduce. Plantoids use humans to collect capital for their reproduction. Their bodies are mechanical. Their spirit is an Ethereum smart contract. It collects cryptocurrency. When you feed it currency it says thank you; the Plantoid Primavera has brought, nods its flower. When it gets enough funds to reproduce itself, it triggers a smart contract that activates a call for bids to create the next version of the Plantoid. In the “mating phase” it looks for a human to create the new version. People vote with micro-donations. Then it identifies a winner and hires that human to create the new one.

There are many Plantoids in the world. Each has its own “DNA”. New artists can add to it. E.g., each artist has to decide on its governance, such as whether it will donate some funds to charity. The aim is to make it more attractive to be contributed to. The most fit get the most money and reproduces themselves. BurningMan this summer is going to feature this.

Every time one reproduces, a small cut is given to the pattern that generated it, and some to the new designer. This flips copyright on its head: the artist has an incentive to make her design more visible and accessible and attractive.

So, why provide legal personhood to autonomous devices? We want them to be able to own their own assets, to assume contractual rights, and legal capacity so they can sue and be sued, and limit their liability. “ Blockchain lets us do that without having to declare the robot to be a legal person.” Blockchain lets us do that without having to declare the robot to be a legal person.

The plant effectively owns the cryptofunds. The law cannot affect this. Smart contracts are enforced by code

Who are the parties to the contract? The original author and new artist? The master agreement? Who can sue who in case of a breach? We don’t know how to answer these questions yet.

Can a plantoid sure for breach of contract? Not if the legal system doesn’t recognize them as legal persons. So who is liable if the plant hurts someone? Can we provide a mechanism for this without conferring personhood? “How do you enforce the law against autonomous agents that cannot be stopped and whose property cannot be seized?”

Q&A

Could you do this with live plants? People would bioengineer them…

A: Yes. Plantoid has already been forked this way. There’s an idea for a forest offering trees to be cut down, with the compensation going to the forest which might eventually buy more land to expand itself.

My interest in this grew out of my interest in decentralized organizations. This enables a project to be an entity that assumes liability for its actions, and to reproduce itself.

Q: [me] Do you own this plantoid?

A: Hmm. I own the physical instantiation but not the code or the smart contract. If this one broke, I could make a new one that connects to the same smart contract. If someone gets hurt because it falls on the, I’m probably liable. If the smart contract is funding terrorism, I’m not the owner of that contract. The physical object is doing nothing but reacting to donations.

Q: But the aim of its reactions is to attract more money…

A: It will be up to the judge.

Q: What are the most likely senarios for the development of these weird objects?

A: A blockchain can provide the interface for humans interacting with each other without needing a legal entity, such as Uber, to centralize control. But you need people to decide to do this. The question is how these entities change the structure of the organization.

Comments Off on [liveblog][ai] Primavera De Filippi: An autonomous flower that merges AI and Blockchain

April 27, 2018

[liveblog][ai] Ben Green: The Limits of "Fair" Algorithms

Ben Green is giving a ThursdAI talk on “The Limits, Perils, and Challenges of ‘Fair’ Algorithms for Criminal Justice Reform.”

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

In 2016, the COMPAS algorithm
became a household name (in some households) when ProPublica showed that it predicted that black men were twice as likely as white men to jump bail. People justifiably got worried that algorithms can be highly biased. At the same time, we think that algorithms may be smarter than humans, Ben says. These have been the poles of the discussion. Optimists think that we can limit the bias to take advantage of the added smartness.

There have been movements to go toward risk assessments for bail, rather than using money bail. E.g., Rand Paul and Kamala Harris have introduced the Pretrial Integrity and Safety Act of 2017. There have also been movements to use scores only to reduce risk assessments, not to increase them.

But are we asking the right questions? Yes, the criminal justice system would be better if judges could make more accurate and unbiased predictions, but it’s not clear that machine learning can do this. So, two questions: 1. Is ML an appropriate tool for this. 2. Is implementing MK algorithms an effective strategy for criminal justice reform?

#1 Is ML and appropriate tool to help judges make more accurate and unbiased predictions?

ML relies on data about the world. This can produce tunnel vision by causing us to focus on particular variables that we have quantified, and ignore others. E.g., when it comes to sentencing, a judge balances deterrence, rehabilitation, retribution, and incapacitating a criminal. COMPAS predicts recidivism, but none of the other factors. This emphasizes incapacitation as the goal of sentencing. This might be good or bad, but the ML has shifted the balance of factors, framing the decision without policy review or public discussion.

Q: Is this for sentencing or bail? Because incapacitation is a more important goal in sentencing than in bail.

A: This is about sentencing. I’ll be referring to both.

Data is always about the past, Ben continues. ML finds statistical correlations among inputs and outputs. It applies those correlations to the new inputs. This assumes that those correlations will hold in the future; it assumes that the future will look like the past. But if we’re trying reform the judicial system, we don’t want the future to look like the past. ML can thus entrench historical discrimination.

Arguments about the fairness of COMPAS are often based on competing mathematical definitions of fairness. But we could also think about the scope of what we couint as fair. ML tries to make a very specific decision: among a population, who recidivates? If you take a step back and consider the broader context of the data and the people, you would recognize that blacks recidivate at a higher rate than whites because of policing practices, economic factors, racism, etc. Without these considerations, you’re throwing away the context and accepting the current correlations as the ground truth. Even if we were to change the base data, the algorithm wouldn’t make the change, unless you retrain it.

Q: Who retrains the data?

A: It depends on the contract the court system has.

Algorithms are not themselves a natural outcome of the world. Subjective decisions go into making them: which data to input, choosing what to predict, etc. The algorithms are brought into court as if they were facts. Their subjectivity is out of the frame. A human expert would be subject to cross examination. We should be thinking of algorithms that way. Cross examination might include asking how accurate the system is for the particular group the defendant is in, etc.

Q: These tools are used in setting bail or a sentence, i.e., before or after a trial. There may not be a venue for cross examination.

In the Loomis case, an expert witness testified that the algorithm was misused. That’s not exactly what I’m suggesting; they couldn’t get to all of it because of the trade secrecy of the algorithms.

Back to the framing question. If you can make the individual decision points fair we sometimes think we’ve made the system fair. But technocratic solutions tend to sanitize rather than alter. You’re conceding the overall framework of the system, overlooking more meaningful changes. E.g., in NY, 71% of voters support ending pre-trial jail for misdemeanors and non-violent felonies. Maybe we should consider that. Or, consider that cutting food stamps has been shown to increases recidivism. Or perhaps we should be reconsidering the wisdom of preventative detention, which was only introduced in the 1980s. Focusing on the tech de-focuses on these sorts of reforms.

Also, technocratic reforms are subject to political capture. E.g., NJ replaced money bail with a risk assessment tool. After some of the people released committed crimes, they changed the tool so that certain crimes were removed from bail. What is an acceptable risk level? How to set the number? Once it’s set, how is it changed?

Q: [me] So, is your idea that these ML tools drive out meaningful change, so we ought not to use them?

A: Roughly, yes.

[Much interesting discussion which I have not captured. E.g., Algorithms can take away the political impetus to restore bail as simply a method to prevent flight. But sentencing software is different, and better algorithms might help, especially if the algorithms are recommending sentences but not imposing them. And much more.]

2. Do algorithms actually help?

How do judges use algorithms to make a decision? Even if the algorithm were perfect, would it improve the decisions judges make? We don’t have much of an empirical answer.

Ben was talking to Jeremy Heffner at Hunch Lab. They make predictive policing software and are well aware of the problem of bias. (“If theres any bias in the system it’s because of the crime data. That’s what we’re trying to address.” — Heffner) But all of the suggestions they give to police officers are called “missions,” which is in the military/jeopardy frame.

People are bad at incorporating quantitative data into decisions. And they filter info through their biases. E.g., the “ban the box” campaign to remove the tick box about criminal backgrounds on job applications actually increased racial discrimination because employers assumed the white applicants were less likely to have arrest records. (Agan and Starr 2016) Also, people have been shown to interpret police camera footage according to their own prior opinions about the police. (sommers 2016)

Evidence from Kentucky (Stevenson 2018): after mandatory risk assessments for bail only made a small increase in pretrial release, and these changes eroded over time as judges returned to their previous habits.

So, we need to be asking the empirical question of how judges actual use these decisions. And should judges incorporate these predictions into their decisions?

Ben’s been looking at the first question:L how do judges use algorithmic predictions? He’s running experiments on Mechanical Turk showing people profiles of defendants — a couple of sentences about the crime, race, previous record arrest record. The Turkers have to give a prediction of recidivism. Ben knows which ones actually recidivated. Some are also given a recommendation based on an algorithmic assessment. That risk score might be the actual one, random, or biased; the Turkers don’t know that about the score.

Q: It might be different if you gave this test to judges.

A: Yes, that’s a limitation.

Q: You ought to give some a percentage of something unrelated, e.g., it will rain, just to see if the number is anchoring people.

A: Good idea

Q: [me] Suppose you find that the Turkers’ assessment of risk is more racially biased than the algorithm…

A: Could be.

[More discussion until we ran out of time. Very interesting.]

3 Comments »

April 5, 2018

[liveblog] Neil Gaikwad Human-AI Collaboration for Sustainable Market Design

I’m at a ThursdAI talk (Harvard’s Berkman Klein Center for Internet & Society and MIT Media Lab) being given by Neil Gaikwad (Twitter: @neilthemathguy, a Ph.D. at the MediaLab, in the Space Enabled Group.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Markets and institutions are parts of complex ecosystem, Neil says. His research looks at data from satellites that show how the Earth is changing: crops, water, etc. Once you’ve gathered the data, you can use machine learning to visualize the changes. There are ecosystems, including of human behavior, that are affected by this. It affects markets and institutions. E.g., a drought may require an institutional response, and affect markets.

Traditional markets, financial markets, and gig economies all share characteristics. Farmers markets are complex ecosystems of people with differing information and different amounts of it, i.e. asymmetric info. Same for financial markets. Same for gig economies.

Indian markets have been failing; there have been 300,000 suicides in the last 30 years. Stock markets have crashed suddenly due to blackbox marketing; in some cases we still don’t know why. And London has banned Uber. So, it doesn’t matter which markets or institutions we look at, they’re losing our trust.

An article in New Scientist asked what we can do to regain this trust. For black box AI, there are questions of fairness and equity. But what would human-machine collaboration be like? Are there design principles for markets.?

Neil stops for us to discuss.

Q: How do you define the justice?

A: Good question. Fairness? Freedom? The designer has a choice about how to define it.

Q: A UN project created an IT platform that put together farmers and direct consumers. The pricing seemed fairer to both parties. So, maybe avoid intermediaries, as a design principle?

Neil continues. So, what is the concept of justice here?

1. Rawls and Kant: Transcendental institutionalism. It’s deontological: follow a principle for perfect justice. Use those principles to define a perfect institution. The properties are defined by a social contract. But it doesn’t work, as in the examples we just saw. What is missing. People and society. [I.e., you run the institution according to principles, but that doesn’t guarantee that the outcome will be fair and just. My example: Early Web enthusiasts like me thought the Web was an institution built on openness, equality, creative anarchy, etc., yet that obviously doesn’t ensure that the outcome will share those properties.]

2. Realized-focused institutionalism (Sen
2009): How to reverse this trend. It is consequentialist: what will be the consequences of the design of an institution. It’s a comparative assessment of different forms of institutions. Instead of asking for the perfectly justice society, Sen asks how justice can be advanced. The most critical tool for evaluating any institution is to look at how it actually realizes how people’s lives change.

Sen argues that principles are important. They can be expressed by “niti,” Sanskrit for rules and institutions. But you also need nyaya: a form of social arrangement that makes sure that those rules are obeyed. These rules come from social choice, not social contract.

Example: Gig economies. The data comes from mechanical turk, upwork, crowdflower, etc. This creates employment for many people, but it’s tough. E.g., identifying images. Use supervised learning for this. The Turkers, etc., do the labelling to train the image recognition system. The Turkers make almost no money at this. This is the wicked problem of market design: The worker can have identifications rejected, sometimes with demeaning comments.

The Market for Lemons” (Akerlog, et al., 1970): all the cars started to look alike and now all gig-workers look alike to those who hire them: there’s no value given to bringing one’s value to the labor.

So, who owns the data? Who has a stake in the models? In the intellectual property?

If you’re a gig worker, you’re working with strangers. You don’t know the reputation of the person giving me data. Or renting me the Airbnb apartment. So, let’s put a rule: reputation is the backbone. In sharing economies, most of the ratings are the highest. Reputation inflation. So, can we trust reputation? This happens because people have no incentive to rate. There’s social pressure to give a positive rating.

So, thinking about Sen, can we think about an incentive for honest reputation? Neil’s group has been thinking about a system [I thought he said Boomerang, but I can’t find that]. It looks at the workers’ incentives. It looks at the workers’ ratings of each other. If you’re a requester, you’ll see the workers you like first.

Does this help AI design?

MoralMachine has had 1.3M voters and 18M pairwise comparisons (i.e., people deciding to go straight or right). Can this be used as a voting based system for ethical decision making (AAAI 2018)? You collect the pairwise preferences, learn the model of preference, come to a collective preference, and have voting rules for collective decision.

Q: Aren’t you collect preferences, not normative judgments? The data says people would rather kill fat people than skinny ones.

A: You need the social behavior but also rules. For this you have to bring people into the loop.

Q: How do we differentiate between what we say we want and what we really want?

A: There are techniques, such as “Bayesian Truth Serum”nomics.mit.edu/files/1966”>Bayesian Truth Serum.

Conclusion: The success of markets, institutions or algorithms, is highly dependent on how this actually affects people’s lives. This thinking should be central to the design and engineering of socio-technical systems.

1 Comment »

April 2, 2018

"If a lion could talk" updated

“If a lion could talk, we could not understand him.”
— Ludwig Wittgenstein, Philosophical Investigations, 1953.

“If an algorithm could talk, we could not understand it.”
— Deep learning, Now.

1 Comment »

March 19, 2018

[liveblog] Kate Zwaard, on the Library of Congress Labs

Kate Zwaard (twitter: @kzwa) Chief of National Digital Strategies at the Library of Congress and leader of the LC Lab, is opening MIT Libraries’ Grand Challenge Summit..The next 1.5 days will be about the grand challenges in enabling scholarly discovery.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

For context she tells us that the LC is the largest library in the world, with 164M items. It has the world’s largest collection of film, maps, comic books, telephone directories, and more. [Too many for me to keep this post up with.]

  • You can wolk for two football fields just in the maps section. The world’s largest collection of recorded sound. The largest collection

  • Personal papers from Ben Franklin, Rosa Parks, Groucho Marx, Claude Shannon, and so many more.

  • Last year they circulated almost a million physical items.

  • Every week 11,000 tangible items come in through the Copyright office.

  • Last year, they digitized 4.7M iems, as well 730M documents crawled from the Web, plus much more. File count: 243M and growing every day.

These serve just one of the LC’s goal: “Acquire, preserve, and provide access to a universal collection of knowledge and the record of America’s creativity.” Not to mention serving Congress, and much more. [I can only keep up with a little of this. Kate’s a fantastic presenter and is not speaking too quickly. The LC is just too big!]

Kate thinks of the LC’s work as an exothermic reaction that needs an activation energy or catalyst. She leads the LC Labs, which started a year ago as a place of experimentation. The LC is a delicate machine, which makes it hard for it to change. The Labs enable experimentation. “Trying things that are easy and cheap is the only way forward.”

When thinking about what to do next, she things about what’s feasible and the impact. One way of having impact: demonstrating that the collection has unexplored potentials for research. She’s especially interested in how the Labs can help deal with the problem of scale at the LC.

She talks about some of Lab’s projects.

If you wanted to make stuff with LC data, there was no way of doing that. Now there’s LC for Robots, added documentation, and Jupyter Notebooks: an open source Web app that let you create open docs that contain code, running text, etc. It lets people play with the API without doing all the work from scratch.

But it’s not enough to throw some resources onto a Web page. The NEH data challenge asked people to create new things using the info about 12M newspapers in the collection. Now the Lab has the Congressional Data Challenge: do something with with Congressional data.

Labs has an Innovator in Residence project. The initial applicants came from LC to give it a try. One of them created a “Beyond Words” crowdsourcing project that asks them to add data to resources

Kate likes helping people find collections they otherwise would have missed. For ten years LC has collaborated wi the Flickr Commons. But they wanted to crowdsource a transcription project for any image of text. A repo will be going up on GitHub shortly for this.

In the second year of the Innovator in Residence, they got the artist Jer Thorp [Twitter: @blprnt] to come for 6 months. Kate talks about his work with the papers of Edward Lorenz, who coined the phrase “The Butterfly Effect.” Jer animated Lorenz’s attractor, which, he points out, looks a bit like a butterfly. Jer’s used the attractor on a collection of 3M words. It results in “something like a poem.” (Here’s Jer’s Artist in the Archive podcast about his residency.)

Jer wonders how we can put serendipity back into the LC and into the Web. “How do we enable our users to be carried off by curiousity not by a particular destination.” The LC is a closed stack library, but it can help guide digital wanderers. ”

Last year the LC released 25M catalog records. Jer did a project that randomly pulls the first names of 20 authors in any particular need. It demonstrates, among other things, the changing demographics of authors. Another project: “Birthy Deathy” that displays birthplace info. Antother looks for polymaths.

In 2018 the Lab will have their first open call for an Innovator in Residence. They’ll be looking for data journalists.

Kate talks about Laura Wrubel
‘s work with the Lab. “Library of Congress Colors” displays a graphic of the dominant colors in a collection.

Or Laura’s Photo Roulette: you guess the date of a photo.

Kate says she likes to think that libraries not just “book holes.” One project: find links among items in the archives. But the WARC format is not amenable to that.

The Lab is partnering with lots of great grops, including JSONstor and WikiData.

They’re working on using machine learning to identify place names in their photos.

What does this have to do with scale, she asks, nothng that the LC has done pretty well with scale. E.g., for the past seven years, the size of their digital collection has doubled every 32 months.

The Library also thinks about how to become a place of warmth and welcome. (She gives a shout out to MIT Libraries’ Future of Libraries
report). Right now, visitors and scholars go to different parts of the building. Visitors to the building see a monument to knowledge, but not a living, breathing place. “The Library is for you. It is a place you own. It is a home.”

She reads from a story by Ann Lamott.

How friendship relates to scale. “Everything good that has happened in my life has happened because of friendship.” The average length of employment of a current employee is thirty years. — that’s not the average retirement year. “It’s not just for the LC but for our field.” Good advice she got: “Pick your career by the kind of people you like to be around.” Librarians!

“We’ve got a tough road ahead of us. We’re still in the early days of the disruption that computation is going to bring to our profession.” “Friendship is what will get us through these hard times. We need to invite peopld into the tent.” “Everything we’ve accomplished has been through the generosity of our friends and colleagues.” This 100% true of the Labs. It’s ust 4 people, but everything they do is done in collaboration.

She concludes (paraphrasing badly): I don’t believe in geniuses, and i don’t believe in paradigm shirts. I believe in friendship and working together over the long term. [She put this far better.]

Q&A

Q: How does the Lab decide on projects?

A: Collaboratively

Q: I’m an archivist at MIT. The works are in closed stack, which can mislead people about the scale. How do we explain the scale in an interesting way.

A: Funding is difficult because so much of the money that comes is to maintain and grow the collection and services. It can be a challenge to carve out funding for experimentation and innovation. We’ve been working hard on finding ways to help people wrap their heads around the place.

Q: Data science students are eager to engage, e.g., as interns. How can academic institutions help to make that happen?

A: We’re very interested in what sorts of partnerships we can create to bring students in. The data is so rich, and the place is so interesting.

Q: Moving from models that think about data as packages as opposed to unpacking and integrating. What do you think about the FAIR principle: making things Findable, Accesible Interoperable, and Reusable? Also, we need to bring in professionals thinking about knowledge much more broadly.

I’m very interested in Hathi Trust‘s data capsules. Are there ways we can allow people to search through audio files that are not going to age into the commons until we’re gone? You’re right: the model of chunks coming in and out is not going to work for us.

Q: In academia, our focus has been to provide resources efficiently. How can weave in serendipity without hurting the efficiency?

A: That’s hard. Maybe we should just serve the person who has a specific purpose. You could give ancillary answers. And crowdsourcing could make a lot more available.

[Great talk.]

Comments Off on [liveblog] Kate Zwaard, on the Library of Congress Labs

February 11, 2018

The story of lead and crime, told in tweets

Patrick Sharkey [twitter: patrick_sharkey] uses a Twitter thread to evaluate the evidence about a possible relationship between exposure to lead and crime. The thread is a bit hard to get unspooled correctly, but it’s worth it as an example of:

1. Thinking carefully about complex evidence and data.

2. How Twitter affects the reasoning and its expression.

3. The complexity of data, which will only get worse (= better) as machine learning can scale up their size and complexity.

Note: I lack the skills and knowledge to evaluate Patrick’s reasoning. And, hat tip to David Lazer for the retweet of the thread.

Comments Off on The story of lead and crime, told in tweets

The brain is not a computer and the world is not information

Robert Epstein argues in Aeon against the dominant assumption that the brain is a computer, that it processes information, stores and retrieves memories, etc. That we assume so comes from what I think of as the informationalizing of everything.

The strongest part of his argument is that computers operate on symbolic information, but brains do not. There is no evidence (that I know of, but I’m no expert. On anything) that the brain decomposes visual images into pixels and those pixels into on-offs in a code that represents colors.

In the second half, Epstein tries to prove that the brain isn’t a computer through some simple experiments, such as drawing a dollar bill from memory and while looking at it. Someone committed to the idea that the brain is a computer would probably just conclude that the brain just isn’t a very good computer. But judge for yourself. There’s more to it than I’m presenting here.

Back to Epstein’s first point…

It is of the essence of information that it is independent of its medium: you can encode it into voltage levels of transistors, magnetized dust on tape, or holes in punch cards, and it’s the same information. Therefore, a representation of a brain’s states in another medium should also be conscious. Epstein doesn’t make the following argument, but I will (and I believe I am cribbing it from someone else but I don’t remember who).

Because information is independent of its medium, we could encode it in dust particles swirling clockwise or counter-clockwise; clockwise is an on, and counter is an off. In fact, imagine there’s a dust cloud somewhere in the universe that has 86 billion motes, the number of neurons in the human brain. Imagine the direction of those motes exactly matches the on-offs of your neurons when you first spied the love of your life across the room. Imagine those spins shift but happen to match how your neural states shifted over the next ten seconds of your life. That dust cloud is thus perfectly representing the informational state of your brain as you fell in love. It is therefore experiencing your feelings and thinking your thoughts.

That by itself is absurd. But perhaps you say it is just hard to imagine. Ok, then let’s change it. Same dust cloud. Same spins. But this time we say that clockwise is an off, and the other is an on. Now that dust cloud no longer represents your brain states. It therefore is both experiencing your thoughts and feeling and is not experiencing them at the same time. Aristotle would tell us that that is logically impossible: a thing cannot simultaneously be something and its opposite.

Anyway…

Toward the end of the article, Epstein gets to a crucial point that I was very glad to see him bring up: Thinking is not a brain activity, but the activity of a body engaged in the world. (He cites Anthony Chemero’s Radical Embodied Cognitive Science (2009) which I have not read. I’d trace it back further to Andy Clark, David Chalmers, Eleanor Rosch, Heidegger…). Reducing it to a brain function, and further stripping the brain of its materiality to focus on its “processing” of “information” is reductive without being clarifying.

I came into this debate many years ago already made skeptical of the most recent claims about the causes of consciousness by having some awareness of the series of failed metaphors we have used over the past couple of thousands of years. Epstein puts this well, citing another book I have not read (and another book I’ve consequently just ordered):

In his book In Our Own Image (2015), the artificial intelligence expert George Zarkadakis describes six different metaphors people have employed over the past 2,000 years to try to explain human intelligence.

In the earliest one, eventually preserved in the Bible, humans were formed from clay or dirt, which an intelligent god then infused with its spirit. That spirit ‘explained’ our intelligence – grammatically, at least.

The invention of hydraulic engineering in the 3rd century BCE led to the popularity of a hydraulic model of human intelligence, the idea that the flow of different fluids in the body – the ‘humours’ – accounted for both our physical and mental functioning. The hydraulic metaphor persisted for more than 1,600 years, handicapping medical practice all the while.

By the 1500s, automata powered by springs and gears had been devised, eventually inspiring leading thinkers such as René Descartes to assert that humans are complex machines. In the 1600s, the British philosopher Thomas Hobbes suggested that thinking arose from small mechanical motions in the brain. By the 1700s, discoveries about electricity and chemistry led to new theories of human intelligence – again, largely metaphorical in nature. In the mid-1800s, inspired by recent advances in communications, the German physicist Hermann von Helmholtz compared the brain to a telegraph.

Maybe this time our tech-based metaphor has happened to get it right. But history says we should assume not. We should be very alert to the disanologies, which Epstein helps us with.

Getting this right, or at least not getting it wrong, matters. The most pressing problem with the informationalizing of thought is not that it applies a metaphor, or even that the metaphor is inapt. Rather it’s that this metaphor leads us to a seriously diminished understanding of what it means to be a living, caring creature.

I think.

 

Hat tip to @JenniferSertl for pointing out the Aeon article.

Comments Off on The brain is not a computer and the world is not information

February 1, 2018

Can AI predict the odds on you leaving the hospital vertically?

A new research paper, published Jan. 24 with 34 co-authors and not peer-reviewed, claims better accuracy than existing software at predicting outcomes like whether a patient will die in the hospital, be discharged and readmitted, and their final diagnosis. To conduct the study, Google obtained de-identified data of 216,221 adults, with more than 46 billion data points between them. The data span 11 combined years at two hospitals,

That’s from an article in Quartz by Dave Gershgorn (Jan. 27, 2018), based on the original article by Google researchers posted at Arxiv.org.

…Google claims vast improvements over traditional models used today for predicting medical outcomes. Its biggest claim is the ability to predict patient deaths 24-48 hours before current methods, which could allow time for doctors to administer life-saving procedures.

Dave points to one of the biggest obstacles to this sort of computing: the data are in such different formats, from hand-written notes to the various form-based data that’s collected. It’s all about the magic of interoperability … and the frustration when data (and services and ideas and language) can’t easily work together. Then there’s what Paul Edwards, in his great book A Vast Machine calls “data friction”: “…the costs in time, energy, and attention required simply to collect, check, store, move, receive, and access data.” (p. 84)

On the other hand, machine learning can sometimes get past the incompatible expression of data in a way that’s so brutal that it’s elegant. One of the earlier breakthroughs in machine learning came in the 1990s when IBM analyzed the English and French versions of Hansard, the bi-lingual transcripts of the Canadian Parliament. Without the machines knowing the first thing about either language, the system produced more accurate results than software that was fed rules of grammar, bilingual dictionaries, etc.

Indeed, the abstract of the Google paper says “Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation of patients’ entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. ” It continues: “We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization.”

The paper also says that their approach affords clinicians “some transparency into the predictions.” Some transparency is definitely better than none. But, as I’ve argued elsewhere, in many instances there may be tools other than transparency that can give us some assurance that AI’s outcomes accord with our aims and our principles of fairness.

 


 

I found this article by clicking on Dave Gershgon’s byline on a brief article about the Wired version of the paper of mine I referenced in the previous paragraph. He does a great job explaining it. And, believe me, it’s hard to get a writer — well, me, anyway — to acknowledge that without having to insert even one caveat. Thanks, Dave!

Comments Off on Can AI predict the odds on you leaving the hospital vertically?

January 11, 2018

Artificial water (+ women at PC Gamer)

I’ve long wondered — like for a couple of decades — when software developers who write algorithms that produce beautiful animations of water will be treated with the respect accorded to painters who create beautiful paintings of water. Both require the creators to observe carefully, choose what they want to express, and apply their skills to realizing their vision. When it comes to artistic vision or merit, are there any serious differences?

In the January issue of PC Gamer , Philippa Warr [twitter: philippawarr] — recently snagged
from Rock, Paper, Shotgun points to v r 3 a museum of water animations put together by Pippin Barr. (It’s conceivable that Pippin Barr is Philippa’s hobbit name. I’m just putting that out there.) The museum is software you download (here) that displays 24 varieties of computer-generated water, from the complex and realistic, to simple textures, to purposefully stylized low-information versions.

[ IMAGE ]

Philippa also points to the Seascape
page by Alexander Alekseev where you can read the code that procedurally produces an astounding graphic of the open sea. You can directly fiddle with the algorithm to immediately see the results. (Thank you, Alexander, for putting this out under a Creative Commons license.) Here’s a video someone made of the result:

Philippa also points to David Li’s Waves where you can adjust wind, choppiness, and scale through sliders.

More than ten years ago we got to the point where bodies of water look stunning in video games. (Falling water is a different question.) In ten years, perhaps we’ll be there with hair. In the meantime, we should recognize software designers as artists when they produce art.

 


 

Good work, PC Gamer, in increasing the number of women reviewers, and especially as members of your editorial staff. As a long-time subscriber I can say that their voices have definitely improved the magazine. More please!

Comments Off on Artificial water (+ women at PC Gamer)

Next Page »