Joho the Blogberkman Archives - Joho the Blog

May 15, 2017

[liveblog][AI] AI and education lightning talks

Sara Watson, a BKC affiliate and a technology critic, is moderating a discussion at the Berkman Klein/Media Lab AI Advance.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Karthik Dinakar at the Media Lab points out what we see in the night sky is in fact distorted by the way gravity bends light, which Einstein called a “gravity lens.” Same for AI: The distortion is often in the data itself. Karthik works on how to help researchers recognize that distortion. He gives an example of how to capture both cardiologist and patient lenses to better to diagnose women’s heart disease.

Chris Bavitz is the head of BKC’s Cyberlaw Clinic. To help Law students understand AI and tech, the Clinic encourages interdisciplinarity. They also help students think critically about the roles of the lawyer and the technologist. The clinic prefers early relationships among them, although thinking too hard about law early on can diminish innovation.

He points to two problems that represent two poles. First, IP and AI: running AI against protected data. Second, issues of fairness, rights, etc.

Leah Plunkett, is a professor at Univ. New Hampshire Law School and is a BKC affiliate. Her topic: How can we use AI to teach? She points out that if Tom Sawyer were real and alive today, he’d be arrested for what he does just in the first chapter. Yet we teach the book as a classic. We think we love a little mischief in our lives, but we apparently don’t like it in our kids. We kick them out of schools. E.g., of 49M students in public schools in 20-11, 3.45M were suspended, and 130,000 students were expelled. These disproportionately affect children from marginalized segments.

Get rid of the BS safety justification and the govt ought to be teaching all our children without exception. So, maybe have AI teach them?

Sarah: So, what can we do?

Chris: We’re thinking about how we can educate state attorneys general, for example.

Karthik: We are so far from getting users, experts, and machine learning folks together.

Leah: Some of it comes down to buy-in and translation across vocabularies and normative frameworks. It helps to build trust to make these translations better.

[I missed the QA from this point on.]

Be the first to comment »

[liveblog] AI Advance opening: Jonathan Zittrain and lightning talks

I’m at a day-long conference/meet-up put on by the Berkman Klein Center‘s and MIT Media Lab‘s “AI for the Common Good” project.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Jonathan Zittrain gives an opening talk. Since we’re meeting at Harvard Law, JZ begins by recalling the origins of what has been called “cyber law,” which has roots here. Back then, the lawyers got to the topic first, and thought that they could just think their way to policy. We are now at another signal moment as we are in a frenzy of building new tech. This time we want instead to involve more groups and think this through. [I am wildly paraphrasing.]

JZ asks: What is it that we intuitively love about human judgment, and are we willing to insist on human judgments that are worse than what a machine would come up with? Suppose for utilitarian reasons we can cede autonomy to our machines — e.g., autonomous cars — shouldn’t we? And what do we do about maintaining local norms? E.g., “You are now entering Texas where your autonomous car will not brake for pedestrians.”

“Should I insist on being misjudged by a human judge because that’s somehow artesinal?” when, ex hypothesis, an AI system might be fairer.

Autonomous systems are not entirely new. They’re bringing to the fore questions that have always been with us. E.g., we grant a sense of discrete intelligence to corporations. E.g., “McDonald’s is upset and may want to sue someone.”

[This is a particularly bad representation of JZ’s talk. Not only is it wildly incomplete, but it misses the through-line and JZ’s wit. Sorry.]

Lightning Talks

Finale Doshi-Velez is particularly interested in interpretable machine learning (ML) models. E.g., suppose you have ten different classifiers that give equally predictive results. Should you provide the most understandable, all of them…?

Why is interpretability so “in vogue”? Suppose non-interpretable AI can do something better? In most cases we don’t know what “better” means. E.g., someone might want to control her glucose level, but perhaps also to control her weight, or other outcomes? Human physicians can still see things that are not coded into the model, and that will be the case for a long time. Also, we want systems that are fair. This means we want interpretable AI systems.

How do we formalize these notions of interpretability? How do we do so for science and beyond? E.g., what is a legal “right to explanation
” mean? She is working with Sam Greshman on how to more formally ground AI interpretability in the cognitive science of explanation.

Vikash Mansinghka leads the eight-person Probabilistic Computing project at MIT. They want to build computing systems that can be our partners, not our replacements. We have assumed that the measure of success of AI is that it beats us at our own game, e.g., AlphaGo, Deep Blue, Watson playing Jeopardy! But games have clearly measurable winners.

His lab is working on augmented intelligence that gives partial solutions, guidelines and hints that help us solve problems that neither system could solve on their own. The need for these systems are most obvious in large-scale human interest projects, e.g., epidemiology, economics, etc. E.g., should a successful nutrition program in SE Asia be tested in Africa too? There are many variables (including cost). BayesDB, developed by his lab, is “augmented intelligence for public interest data science.”

Traditional computer science, computing systems are built up from circuits to algorithms. Engineers can trade off performance for interpretability. Probabilisitic systems have some of the same considerations. [Sorry, I didn’t get that last point. My fault!]

John Palfrey is a former Exec. Dir. of BKC, chair of the Knight Foundation (a funder of this project) and many other things. Where can we, BKC and the Media Lab, be most effective as a research organization? First, we’ve had the most success when we merge theory and practice. And building things. And communicating. Second, we have not yet defined the research question sufficiently. “We’re close to something that clearly relates to AI, ethics and government” but we don’t yet have the well-defined research questions.

The Knight Foundation thinks this area is a big deal. AI could be a tool for the public good, but it also might not be. “We’re queasy” about it, as well as excited.

Nadya Peek is at the Media Lab and has been researching “macines that make machines.” She points to the first computer-controlled machine (“Teaching Power Tools to Run Themselves“) where the aim was precision. People controlled these CCMs: programmers, CAD/CAM folks, etc. That’s still the case but it looks different. Now the old jobs are being done by far fewer people. But the spaces between doesn’t always work so well. E.g., Apple can define an automatiable workflow for milling components, but if you’re student doing a one-off project, it can be very difficult to get all the integrations right. The student doesn’t much care about a repeatable workflow.

Who has access to an Apple-like infrastructure? How can we make precision-based one-offs easier to create? (She teaches a course at MIT called “How to create a machine that can create almost anything.”)

Nathan Mathias, MIT grad student with a newly-minted Ph.D. (congrats, Nathan!), and BKC community member, is facilitating the discussion. He asks how we conceptualize the range of questions that these talks have raised. And, what are the tools we need to create? What are the social processes behind that? How can we communicate what we want to machines and understand what they “think” they’re doing? Who can do what, where that raises questions about literacy, policy, and legal issues? Finally, how can we get to the questions we need to ask, how to answer them, and how to organize people, institutions, and automated systems? Scholarly inquiry, organizing people socially and politically, creating policies, etc.? How do we get there? How can we build AI systems that are “generative” in JZ’s sense: systems that we can all contribute to on relatively equal terms and share them with others.

Nathan: Vikash, what do you do when people disagree?

Vikash: When you include the sources, you can provide probabilistic responses.

Finale: When a system can’t provide a single answer, it ought to provide multiple answers. We need humans to give systems clear values. AI things are not moral, ethical things. That’s us.

Vikash: We’ve made great strides in systems that can deal with what may or may not be true, but not in terms of preference.

Nathan: An audience member wants to know what we have to do to prevent AI from repeating human bias.

Nadya: We need to include the people affected in the conversations about these systems. There are assumptions about the independence of values that just aren’t true.

Nathan: How can people not close to these systems be heard?

JP: Ethan Zuckerman, can you respond?

Ethan: One of my colleagues, Joy Buolamwini, is working on what she calls the Algorithmic Justice League, looking at computer vision algorithms that don’t work on people of color. In part this is because the tests use to train cv systems are 70% white male faces. So she’s generating new sets of facial data that we can retest on. Overall, it’d be good to use test data that represents the real world, and to make sure a representation of humanity is working on these systems. So here’s my question: We find co-design works well: bringing in the affected populations to talk with the system designers?

[Damn, I missed Yochai Benkler‘s comment.]

Finale: We should also enable people to interrogate AI when the results seem questionable or unfair. We need to be thinking about the proccesses for resolving such questions.

Nadya: It’s never “people” in general who are affected. It’s always particular people with agendas, from places and institutions, etc.

Be the first to comment »

November 22, 2016

[liveblog][bkc] Scott Bradner: IANA: Important, but not for what they do"

I’m at a Berkman Klein [twitter: BKCHarvard] talk by Scott Bradner about IANA, the Internet Assigned Names Authority. Scott is one of the people responsible for giving us the Internet. So, thanks for that, Scott!

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Scott begins by pointing to the “absurdity” of Ted Cruz’s campaign
to prevent the “Internet giveaway.”“ The idea that “Obama gave away the Internet” is “hooey,”” The idea that “Obama gave away the Internet” is “hooey,” says Scott.

IANA started with a need to coordinate information, not to control it, he says. It began with the Network Working Group in 1968. Then Requests for Comments (RFC) in 1969. . The name “IANA” showed up in 1988, although the function had begun in 1972 with coordinating socket numbers. The Domain Name System made IP addresses easier to use, including the hierarchical clustering under .com, .org, etc.

Back to the beginning, computers were too expensive for every gov’t department to have one. So, ARPA wanted to share large and expensive computers among users. It created a packet-based network, which broke info up into packets that were then transmitted. Packet networking was the idea of Paul Baran at RAND who wanted a system that would survive a nuclear strike, but the aim of that network was to share computers. The packets had enough info to make it to their destinations, but the packet design made “no assumptions about the underlying transport network.” No service guarantees about packets making it through were offered. The Internet is the interconnection of the different networks, including the commercial networks that began showing up in the 1990s.

No one cared about the Net for decades. To the traditional telecom and corporate networking people, it was just a toy—”No quality of service, no guarantees, no security, no one in charge.” IBM thought you couldn’t build a network out of this because their definition of a network — the minimal requirements — was different. “That was great because it meant the regulators ignored us.”

The IANA function went into steady state 1984-1995. It did some allocating of addresses. (When Scott asked Jon Postel for addresses for Harvard, Postel sent him some; Postel was the one-person domain allocation shop.) IANA ran it for the top level domains.

“The Internet has few needs,” Scott says. It’s almost all done through collaboration and agreement. There are no requirements except at a very simple level. The only centralized functions: 1. We have to agree on what the protocol parameters are. Machines have to understand how to read the packet headers. 2. We have to allocate blocks of IP addresses and ASN‘s. 3. We have to have a single DNS, at least for now. IANA handles those three. “Everything else is distributed.” Everything else is collaboration.

In 1993, Network Solutions was given permission to start selling domain names. A domain cost $100 for 2 yrs. There were were about 100M names at that point, which added up to real money. Some countries even started selling off their TLD’s (top level domains), e.g., .tv

IANA dealt with three topics, but DNS was the only one of interest to most people. There was pressure to create new TLDs, which Scott thinks doesn’t solve any real problems. That power was given to ISOC, which set up the International Ad-Hoc Committee in 1996. It set up 7 new TLDs, one of which (.web) caused Image Online Design to sue Postel because they said Postel had promised it to them. The Dept. of Commerce saw that it needed to do something. So they put out an RFC and got 400+ comments. Meanwhile, Postel worked on a plan for institutionalizing the IANA function, which culminated in a conference in Jan 1998. Postel couldn’t go, so Scott presented in his stead.

Shortly after that the Dept of Commerce proposed having a private non-profit coordinate and manage the allocation of the blocks to the registries, manage the file that determines TLDs, and decide which TLDs should exist…the functions of IANA. “There’s no Internet governance here, simply what IANA did.”

There were meetings around the world to discuss this, including one sponsored by the Berkman Center. Many of the people attending were there to discuss Internet governance, which was not the point of the meetings. One person said, “Why are we wasting time talking about TLDs when the Internet is going to destroy countries?” “Most of us thought that was a well-needed vacuum,” says Scott. We didn’t need Internet governance. We were better off without it.

Jon Postel submitted a proposal for an Internet Corporation for Assigned Names and Numbers (ICANN). He died of a heart attack shortly thereafter. The Dept. of Commerce accepted the proposal. In Oct 1998 ICANN had its first board meeting. It was a closed meeting “which anticipated much of what’s wrong with ICANN.”

The Dept of Commerce had oversight over ICANN but its only power was to say yes or no to the file that lists the TLDs and the IP addresses of the nameservers for each of the TLDs.” “That’s the entirety of the control the US govt had over ICANN. “In theory, the Dept of Commerce could have said ‘Take Cuba out of that file,’ but that’s the most ridiculous thing they could have done and most of the world could have ignored them.” The Dept of Commerce never said no to ICANN.

ICANN institutionalizes the IANA. But it also has to deal with trademark issues coming out of domain name registrations, and consults on DNS security issues. “ICANN was formed as a little organization to replace Jon Postel.”

It didn’t stay little. ICANN’s budget went from a few million bucks to over $100M.“ “That’s a lot of money to replace a few competent geeks.”” “That’s a lot of money to replace a few competent geeks.” It’s also approved hundreds of TLDs. The bylaws went from 7,000 words to 37,000 words. “If you need 37,000 words to say what you’re doing, there’s something wrong.”

The world started to change. Many govts see the Net as an intrinsic threat.

  • In Sept. 2001, India, Brazil, and South Africa proposed that the UN undertake governance of the Internet.

  • Oct 2013: After Snowden, the Montevideo Statement on the Future of Internet Cooperation proposing moving away from US govt’s oversight of IANA.

  • Apr. 2014: NetMundial Initiative. “Self-appointed 25-member council to perform internet governance.”

  • Mar. 2014: NTIA announces its intent to transition key domain name functions.

The NTIA proposal was supposed to involve all the stakeholders. But it also said that ICANN should continue to maintain the openness of the Internet…a function that ICANN never had. Openness arises from the technical nature of the Net. NTIA said it wouldn’t accept an inter-governmental solution (like the ITU) because it has to involve all the stakeholders.

So who holds ICANN accountable? They created a community process that is “incredibly strong.” It can change the bylaws, and remove ICAN directors or the entire board.

Meanwhile, the US Congress got bent out of shape because the US is “giving away the Internet.” It blocked the NTIA from acting until Sept. 2016. On Oct. 1 IANA became independent and is under the control of the community. “This cannot be undone.” “If the transition had not happened, forces in the UN would likely have taken over” governance of the Internet. This would have been much more likely if the NTIA had not let it go. “The IANA performs coordination functions, not governance. There is no Internet governance.”

How can there be no governance? “Because nobody cared for long enough that it got away from them,” Scott says. “But is this a problem we have to fix?”

He leaves the answer hanging. [SPOILER: The answer is NO]

Q&A

Q: Under whom do the IRI‘s [Internationalized Resource Identifier] operate?

A: Some Europeans offered to take over European domain names from Jon Postel. It’s an open question whether they have authority to do what they’re doing Every one has its own policy development process.

Q: Where’s research being done to make a more distributed Internet?

A: There have been many proposals ever since ICANN was formed to have some sort of distributed maintenance of the TLDs. But it always comes down to you seeing the same .com site as I do — the same address pointing to the same site for all Internet users. You still have to centralize or at least distribute the mapping. Some people are looking at geographic addressing, although it doesn’t scale.

Q: Do you think Trump could make the US more like China in terms of the Internet?

A: Trump signed on to Cruz’s position on IANA. The security issue is a big one, very real. The gut reaction to recent DDOS
attacks is to fix that rather than to look at the root cause, which was crappy devices. The Chinese government controls the Net in China by making everyone go through a central, national connection. Most countries don’t do that. OTOH, England is imposing very strict content

rules that all ISPs have to obey. We may be moving to a telephony model, which is a Westphalian
idea of national Internets.

Q: The Net seems to need other things internationally controlled, e.g. buffer bloat. Peer pressure seems to be the only way: you throw people off who disagree.

A: IANA doesn’t have agreements with service providers. Buffer bloat is a real issue but it only affects the people who have it, unlike the IoT DDOS attack that affected us all. Are you going to kick off people who’s home security cameras are insecure?

Q: Russia seems to be taking the opposite approach. It has lots of connections coming into it, perhaps for fear that someone would cut them off. Terrorist groups are cutting cables, botnets, etc.

A: Great question. It’s not clear there’s an answer.

Q: With IPv6 there are many more address spaces to give out. How does that change things?

A: The DNS is an amazing success story. It scales extremely well … although there are scaling issues with the backbone routing systems, which are big and expensive. “That’s one of the issues we wanted to address when we did IPv6.”

Q: You said that ICANN has a spotty history of transparency. What role do you think ICANN is going to play going forward? Can it improve on its track record?

A: I’m not sure that it’s relevant. IANA’s functions are not a governance function. The only thing like a governance issue are the TLDs and ICANN has already blown that.

Be the first to comment »

January 13, 2016

Perfect Eavesdropping

Suppose a laptop were found at the apartment of one of the perpetrators of last year’s Paris attacks. It’s searched by the authorities pursuant to a warrant, and they find a file on the laptop that’s a set of instructions for carrying out the attacks.

Thus begins Jonathan Zittrain‘s consideration of an all-too-plausible hypothetical. Should Google respond to a request to search everyone’s gmail inboxes to find everyone to whom the to-do list was sent ? As JZ says, you can’t get a warrant to search an entire city, much less hundreds of millions of inboxes.

But, while this is a search that sweeps a good portion of the globe, it doesn’t “listen in” on any mail except for that which contains a precise string of words in a precise order. What happens next would depend upon the discretion of the investigators.

JZ points out that Google already does something akin to this when it searches for inboxes that contain known child pornography images.

JZ’s treatment is even handed and clear. (He’s a renown law professor. He knows how to do these things.) He discusses the reasons pro and con. He comes to his own personal conclusion. It’s a model of clarity of exposition and reasoning.

I like this article a lot on its own, but I find it especially fascinating because of its implications for the confused feeling of violation many of us have when it’s a computer doing the looking. If a computer scans your emails looking for a terrorist to-do list, has it violated your sense of privacy? If a robot looks at you naked, should you be embarrassed? Our sense of violation is separable from our legal and moral right to privacy question, but the two meanings often get mixed up in such discussions. Not in JZ’s, but often enough.

Be the first to comment »

August 15, 2014

From Berkman: Zeynep and Ethanz on the Web We Want

This week there were two out-of-the-park posts by Berkman folk: Ethan Zuckerman on advertising as the Net’s original sin, and Zeynep Tufecki on the power of the open Internet as demonstrated by coverage of the riots in Ferguson. Each provides a view on whether the Net is a failed promise. Each is brilliant and brilliantly written.

Zeynep on Ferguson

Zeynep, who has written with wisdom and insight on the role of social media in the Turkish protests (e.g., here and here), looks at how Twitter brought the Ferguson police riots onto the national agenda and how well Twitter “covered” them. But those events didn’t make a dent in Facebook’s presentation of news. Why? she asks.

Twitter is an open platform where anyone can post whatever they want. It therefore reflects our interests — although no medium is a mere reflection. FB, on the other hand, uses algorithms to determine what it thinks our interests are … except that its algorithms are actually tuned to get us to click more so that FB can show us more ads. (Zeynep made that point about an early and errant draft of my CNN.com commentary on the FB mood experiment. Thanks, Zeynep!) She uses this to make an important point about the Net’s value as a medium the agenda of which is not set by commercial interests. She talks about this as “Net Neutrality,” extending it from its usual application to the access providers (Comcast, Verizon and their small handful of buddies) to those providing important platforms such as Facebook.

She concludes (but please read it all!):

How the internet is run, governed and filtered is a human rights issue.

And despite a lot of dismal developments, this fight is far from over, and its enemy is cynicism and dismissal of this reality.

Don’t let anyone tell you otherwise.

What happens to #Ferguson affects what happens to Ferguson.

Yup yup yup. This post is required reading for all of the cynics who would impress us with their wake-up-and-smell-the-shitty-coffee pessimism.

Ethan on Ads

Ethan cites a talk by Maciej Ceglowski for the insight that “we’ve ended up with surveillance as the default, if not sole, internet business model.” Says Ethan,

I have come to believe that advertising is the original sin of the web. The fallen state of our Internet is a direct, if unintentional, consequence of choosing advertising as the default model to support online content and services.

Since Internet ads are more effective as a business model than as an actual business, companies are driven ever more frantically to gather customer data in order to hold out the hope of making their ads more effective. And there went out privacy. (This is a very rough paraphrase of Ethan’s argument.)

Ethan pays more than lip service to the benefits — promised and delivered — of the ad-supported Web. But he points to four rather devastating drawbacks, include the distortions caused by algorithmic filtering that Zeynep warns us about. Then he discusses what we can do about it.

I’m not going to try to summarize any further. You need to read this piece. And you will enjoy it. For example, betcha can’t guess who wrote the code for the world’s first pop-up ads. Answer:   Ethan  .

Also recommended: Jeff Jarvis’ response and Mathew Ingram’s response to both. I myself have little hope that advertising can be made significantly better, where “better” means being unreservedly in the interests of “consumers” and sufficiently valuable to the advertisers. I’m of course not confident about this, and maybe tomorrow someone will come up with the solution, but my thinking is based on the assumption that the open Web is always going to be a better way for us to discover what we care about because the native building material of the Web is in fact what we find mutually interesting.

Conclusion:

Read both these articles. They are important contributions to understanding the Web We Want.

Be the first to comment »

August 1, 2014

[2b2k] Ethanz on Steve Jobs, genius, and CEOs

Ethan Zuckerman has a great post that begins with a recounting of his youthful discomfort with the way the CEO of his early social media company, Tripod, was treated by the media as if he had done it all by himself.

Hearing me rant about this one too many times, Kara Berklich, our head of marketing, pulled me aside and explained that the visionary CEO was a necessary social construct. With Bo as the single protagonist of our corporate story, we were far more marketable than a complex story with half a dozen key figures and a cast of thousands. When you’re selling a news story, it’s easier to pitch House than Game of Thrones.

This leads Ethan to discourse on the social nature of innovation, and to a brilliant critique of Steve Jobs the person and the book.

My personal TL;DR: Geniuses are networks. But, then, aren’t we all?

Bonus: Ethan includes this coverage from Nightline, 1997. This is what the Internet looked like — at its best — to the media back then. (Go to 2:36 for Ethan his own self.)

1 Comment »

July 26, 2014

Municipal nets, municipal electric power, and learning from history

The debate over whether municipalities should be allowed to provide Internet access has been heating up. Twenty states ban it. Tom Wheeler, the chair of the FCC, has said he wants to “preempt” those laws. Congress is maneuvering to extend the ban nationwide.

Jim Baller, who has been writing about the laws, policies, and economics of network deployment for decades, has found an eerie resonance of this contemporary debate. Here’s a scan of the table of contents of a 1906 (yes, 1906) issue of Moody’s that features a symposium on “Municipal Ownership and Operation.”

Scan of 1906 Moody's

Click image to enlarge

The Moody’s articles are obviously not talking about the Internet. They’re talking about the electric grid.

In a 1994 (yes, 1994) article published just as the Clinton administration (yes, Clinton) was developing principles for the deployment of the “information superhighway,” Jim wrote that if we want the far-reaching benefits foreseen by the National Telecommunications and Information Administration (and they were amazingly prescient (but why can’t I find the report online??)), then we ought to learn four things from the deployment of the electric grid in the 1880s and 1890s:

First, the history of the electric power industry teaches that one cannot expect private profit-maximizing firms to provide “universal service” or anything like it in the early years (or decades) of their operations, when the allure of the most profitable markets is most compelling.

Second, the history of the electric power industry teaches that opening the doors to anyone willing to provide critical public services can be counterproductive and that it is essential to watch carefully the growth of private firms that enter the field. If such growth is left unchecked, the firms may become so large and complex that government institutions can no longer control or even understand them. Until government eventually catches up, the public may suffer incalculable injury.

Third, the history of the electric power industry teaches that monopolists will use all means available to influence the opinions of lawmakers and the public in their favor and will sometimes have frightening success

Fourth, and most important, the history of the electric power industry teaches that the presence or threat of competition from the public sector is one of the best and surest ways to secure quality service and reasonable prices from private enterprises involved in the delivery of critical public services.

Learn from history? Repeat it? Or intervene as citizens to get the history we want? I’ll take door number 3, please.

8 Comments »

Why I have not been blogging much: it’s my book’s fault and more

My blogging has gone way down in frequency and probably in quality. I think there are two reasons.

First, I’ve been wrapped up in trying to plot a new book. I’ve known for about three years the set of things I want to write about, but I’ve had my usual difficult time figuring out what the book is actually about. For example, when I was planning Everything is Miscellaneous, I knew that I wanted to write about the importance of metadata, but it took a couple of years to figure out that it wasn’t a book about metadata, or a book about the virtue of messiness, or two dozen other attempts at a top line.

I’m going through the same process now. The process itself consists of me writing a summary of each chapter. Except they’re not summaries. They’re like the article version of each chapter and usually work about to about 2,000 words. That’s because a chapter is more like a path than a list, and I can’t tell what’s on the path until I walk it. Given that I work for a living, each complete iteration can take me 2-3 months. And then I realize that I have it all wrong.

I don’t feel comfortable going through this process in public. My investment of time into these book summaries is evidence of how seriously I take them, but my experience shows that nineteen times out of twenty, what I thought was a good idea is a very bad idea. It’s embarrassing. So, I don’t show these drafts even to the brilliant, warm and forgiving Berkman Book Club — a group of Berkfolk writing books — not only because it’s embarrassing but because I don’t want to inflict 10,000 words on them when I know the odds are that I’m going to do a thorough re-write starting tomorrow. The only people who see these drafts are my literary agents and friends David Miller and Lisa Adams, who are crucial critics in helping me to see what’s wrong and right in what I’ve done, and working out the next approach.

Anyway, I’ve been very focused for the past couple of months on figuring out this next book. I think I’m getting closer. But I always think that.

The second reason I haven’t been blogging much: I’ve been mildly depressed. No cause for alarm. It’s situational and it’s getting better. I’ve been looking for a new job because the Harvard Library Innovation Lab that I’ve co-directed, with the fabulous Kim Dulin, for almost five years has been given a new mission. I’m very proud of what we — mainly the amazing developers who are actually more like innovation fellows — have done, and I’m very sorry to leave. Facing unemployment hasn’t helped my mood. There have been some other stresses as well. So: somewhat depressed. And that makes it harder for me to post to my blog for some reason.

I thought you might want to know, not that anyone cares [Sniffles, idly kicks at a stone in the ground, waits for a hug].

8 Comments »

March 5, 2014

[berkman] Karim Lakhani on disclosure policies and innovation

Karim Lakhani of Harvard Business School (and a Berkman associate, and a member of the Harvard Institute for Quantititative Social Science) is giving a talk called “How disclosure policies impact search in open innovation, atopic he has researched with Kevin Boudreau of the London Business School.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Karim has been thinking about how crowds can contribute to innovation for 17 years, since he was at GE. There are two ways this happens:

1. Competitions and contests at which lots of people work on the same problem. Karim has asked who wins and why, motives, how they behave, etc.

2. Communities/Collaboration. E.g., open source software. Here the questions are: Motives? Costs and benefits? Self-selection and joining scripts? Partner selection?

More fundamentally, he wants to know why both of these approaches work so well.

He works with NASA, using topcoder.com: 600K users world wide [pdf]. He also works with Harvard Medical School [more] to see how collaboration works there where (as with Open Source) people choose their collaborators rather than having them chosen top-down.

Karim shows a video about a contest to solve an issue with the International Space Station, having to do with the bending of bars (longerons) in the solar collectors when they are in the shadows. NASA wanted a sophisticated algorithm. (See www.topcoder.com/iss) . It was a two week contest, $30K price. Two thousand signed up for it; 459 submitted solutions. The winners came from around the globe. Many of the solutions replicated or slightly exceeded what NASA had developed with its contractors, but this was done in just two weeks simply for the price of the contest prize.

Karim says he’ll begin by giving us the nutshell version of the paper he will discuss with us today. Innovation systems create incentives to exert innovative effort and encourage the disclosure of knowledge. The timing and the form of the disclosures differentiates systems. E.g., Open Science tends to publish when near done, while Open Source tends to be more iterative. The paper argues that intermediate disclosures (as in open source) dampen incentives and participation, yet lead to higher perrformance. There’s more exploration and experimentation when there’s disclosure only at the end.

Karim’s TL;DR: Disclosure isn’t always helpful for innovation, depending on the conditions.

There is a false debate between closed and open innovation. Rather, what differentiates regimes is when the disclosure occurs, and who has the right to use those disclosures. Intermediate disclosure [i.e., disclosure along the way] can involve a range of outputs. E.g., the Human Genome Project enshrined intermediate disclosure as part of an academic science project; you had to disclose discoveries within 24 hours.

Q: What constitutes disclosure? Would talking with another mathematician at a conference count as disclosure?

A: Yes. It would be intermediate disclosure. But there are many nuances.

Karim says that Allen, Meyer and Nuvolari have shown that historically, intermediate disclosure has been an important source of technological progress. E.g., the Wright brothers were able to invent the airplane because of a vibrant community. [I’m using the term “invent” loosely here.]

How do you encourage continued innovation while enabling early re-use of it? “Greater disclosure requirements will degrade incentives for upstream innovators to undertake risky investment.” (Green & Scotchmer; Bessen & Maskin.) We see compensating mechanisms under regimes of greater disclosure: E.g., priority and citations in academia; signing and authorship in Open Source. You may also attract people who have a sharing ethos; e.g., Linus Torvalds.

Research confirms that the more access your provide, the more reuse and sharing there will be. (Cf. Eric von Hippel.) Platforms encourage reuse of core components. (cf. Boudreau 2010; Rysman and Simcoe 2008) [I am not getting all of Karim’s citations. Not even close.]

Another approach looks at innovation as a problem-solving process. And that entails search. You need to search to find the best solutions in an uncertain space. Sometimes innovators use “novel combinations of existing knowledge” to find the best solutions. So let’s look at the paths by which innovators come up with ideas. There’s a line of research that assumes that the paths are the essential element to understand the innovation process.

Mathematical formulations of this show you want lots of people searching independently. The broader the better for innovation outcomes. But there is a tendency of the researchers to converge on the initially successful paths. These are affected by decisions about when to disclose.

So, Karim and Kevin Boudreau implemented a field experiment. They used TopCoder, offering $6K, to set up a Med School project involving computational biology. The project let them get fine-grained info about what was going on over the two weeks of the contest.

700 people signed up. They matched them on skills and randomized them into three different disclosure treatments. 1. Standard contest format, with a prize at the end of each week. (Submissions were automatically scored, and the first week prizes went to the highest at that time.) 2. Submitted code was instantly posted to a wiki where anyone could use it. 3. In the first week you work without disclosure, but in the second week submissions were posted to the wiki.

For those whose work is disclosed: You can find and see the most successful. You can get money if your code is reused. In the non-disclosure regime you cannot observe solutions and all communications are bared. In both cases, you can see market signals and who the top coders are.

Of the 733 signups from 69 different countries, 122 coders submitted 654 submissions, with 89 different approaches. 44% were professionals; 56% were students. The skewed very young. 98% men. They spent about 10 hours a week, which is typical of Open Source. (There’s evidence that women choose not to participate in contests like this.) The results beat the NIH’s approach to the problem which was developed at great cost over years. “This tells me that across our economy there are lots of low-performing” processes in many institutions. “This works.”

What motivated the participants? Extrinsic motives matter (cash, job market signals) and intrinsic motives do too (fun, etc.). But so do prosocial motives (community belonging, identity). Other research Karim has done shows that there’s no relation between skills and motives. “Remember that in contests most people are losing, so there have to be things other than money driving them.”

Results from the experiment: More disclosure meant lower participation. Also, more disclosure correlated with the hours worked going down. The incentives and efforts are lower when there’s intermediate disclosure. “This is contrary to my expectations,”Karim says.

Q: In the intermediate disclosure regime is there an incentive to hold your stuff back until the end when no one else can benefit from it?

A: One guy admitted to this, and said he felt bad about it. He won top prize in the second week, but was shamed in the forums.

In the intermediate disclosure regime, you get better performance (i.e., better submission score). In the mixed experiment, performance shot up in the second week once the work of others was available.

They analyzed the ten canonical approaches and had three Ph.D.s tag the submissions with those approaches. The solutions were combinations of those ten techniques.

With no intermediate disclosures, the search patterns are chaotic. With intermedia disclosures, there is more convergence and learning. Intermediate disclosure resulted in 30% fewer different approaches. The no-disclsoure folks were searching in the lower-performance end of the pool. There was more exploration and experimentation in their searches when there was no intermediate disclosure, and more convergence and collaboration when there is.

Increased reuse comes at the cost of incentives. The overall stock of knowledge created is low, although the quality is higher. More convergent behavior comes with intermediate disclosures, which relies on the stock of knowledge available. The fear is that with intermediate disclosure , people will get stuck on local optima — path dependnce is a real risk in intermediate disclosure.

There are comparative advantages of the two systems. Where there is a broad stock of knowledge, intermediate disclosure works best. Plus the diversity of participants may overcome local optima lock-in. Final disclosure [i.e., disclosure only at the end] is useful where there’s broad-based experimentation. “Firms have figured out how to play both sides.” E.g., Apple is closed but also a heavy participant in Open Source.

Q&A

Q: Where did the best solutions come from?

A: From intermediate disclosure. The winner came from there, and then the next five were derivative.

Q: How about with the mixed?

A: The two weeks tracked the results of the final and intermediate disclosure regimes.

Q: [me] How confident are you that this applies outside of this lab?

A: I think it does, but even this platform is selecting on a very elite set of people who are used to competing. One criticism is that we’re using a platform that attracts competitors who are not used to sharing. But rank-order based platforms are endemic throughout society. SATs, law school tests: rank order is endemic in our society. In that sense we can argue that there’s a generalizability here. Even in Wikipedia and Open Source there is status-based ranking.

Q: Can we generalize this to systems where the outputs of innovation aren’t units of code, but, e.g., educational systems or municipal govts?

Q: We study coders because we can evaluate their work. But I think there are generalizations about how to organize a system for innovation, even if the outcome isn’t code. What inputs go into your search processes? How broad do you do?

Q: Does it matter that you have groups that are more or less skilled?

A: We used the Topcoder skill ratings as a control.

Q: The guy who held back results from the Intermediate regime would have won in real life without remorse.

A: Von Hippel’s research says that there are informal norms-based rules that prevent copying. E.g., chefs frown on copying recipes.

Q: How would you reform copyright/patent?

A: I don’t have a good answer. My law professor friends say the law has gone too far to protect incentives. There’s room to pull that back in order to encourage reuse. You can ask why the Genome Project’s Bermuda Rules (pro disclosure) weren’t widely adopted among academics. Academics’ incentives are not set up to encourage automatic posting and sharing.

Q: The Human Genome Project resulted in a splintering that set up a for-profit org that does not disclose. How do you prevent that?

A: You need the right contracts.

This was a very stimulating talk. I am a big fan of Karim and his work.


Afterwards Karim and I chatted briefly about whether the fact that 98% of Topcoder competitors are men raises issues about generalizing the results. Karim pointed to the general pervasiveness of rank-ordered systems like the one at TopCoder. That does suggest that the results are generalizable across many systems in our culture. Of course, there’s a risk that optimizing such systems might result in less innovation (using the same measures) than trying to open those systems up to people averse to them. That is, optimizing for TopCoder-style systems for innovation might create a local optima lock-in. For example, if the site were about preparing fish instead of code, and Japanese chefs somehow didn’t feel comfortable there because of its norms and values, how much could you conclude about optimizing conditions for fish innovation? Whereas, if you changed the conditions, you’d likely get sushi-based innovation that the system otherwise inadvertently optimized against.


[Note: 1. Karim’s point in our after-discussion was purely about the generalizability of the results, not about their desirability. 2. I’m trying to make a narrow point about the value of diversity of ideas for innovation processes, and not otherwise comparing women and Japanese chefs.]

1 Comment »

October 16, 2013

[berkman] Zeynep Tufecki on the boom-and-bust cycle of social-media-fueled protests (with live reporting)

Zeynep Tufecki [twitter:zeynep] is giving a Berkman Tuesday Lunch talk titled “Gezi Park Protests & the Boom-Bust Cycle of Social Media Fueled Protest.” She says that surveillance and social media + protest are two of her topics, so swhen protests broke out in her home country of Turkey, she felt she really had to study it. She is today presenting issues she is still working through.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

She says that on the positive side of the role of social media on politics, we see lower coordination costs, the ability to shape the narrative, and an ability to overcome internal prejudice. On the negative: slacktivism, surveilliance, and propaganda. For her the lower costs cause the boom-bust cycle in social media-fueled activism. There are many questions she says, including why most of these social-media fueled protests fizzle out.

People usually argue about the wrong questions, Zeynep says. Instead, she suggests that we stop looking so much at the outputs of social media-fueled protests and instead at their capacity-building. Also, stop using offline or online as the important differentiation, and instead look at them in terms of what they signal.

She gives some background on Gezi, Turkey. The media focused on Taksim Square in Istanbul, but the action was actually in Gezi park. Prime Minister Erdogan wanted to turn the park into a developed area with housing, a shopping mall, and an old Ottoman barracks. This was an unpopular plan, and was taken as a symbol for wider discontent. Neighborhood people held a small protest. Maybe 30 people. But it was met with overwhelming force, which raised fear of the gov’t become authoritarian. People took to the streets. Turkish media are owned by large corporate conglomerates in cahoots with the gov’t. CNN locally was running shows about penguins, while CNN International was covering the protests. “So people got upset and took to Twitter and to the streets” (including an image of penguins in gas masks).


via Turkish Press Review Blog

After multiday clashes in the area, “coordinated and spread almost solely on social media,” Gezi Park was Occupied. (Zeynep stresses that Turkey, unlike, other countries nearby, has a popularly-elected gov’t.) Zeynep joined in, packing an audio recorder, a bike helmet, and a tear gas mask. And sun protector lotion because statistically, she says, she felt most threatened by the Sun.

A single party had been in power in Turkey for 11 years. The country was polarized, but with an ineffective opposition. There are barriers to creating new parties (you have to get 10% to get any seats), which means the country is locked into an ineffective opposition.

At first the occupation was like a fair: clean, kitchens that were feeding 10K people, and like a carnival in the evenings because of the visitors. Occasionally, you’d get tear gassed. “Woodstock meets the Paris Commune.” She shows a picture of a Sufi whirler wearing a gas mask. People were finding politics.

There was “one no, many yes-es,” [an anti-globalization meme] which Zeynep argues is an Internet phenomenon. Turks who normally would never talk with one another found each other in the park.

There’s the free-rider question. Even if the protest itself were a festival, the costs would be real: Five people died, thousands were injured by tear gas cannisters which can be lethal.

The protestors’ main grievances were: growing authoritarianism, media censoprship, and police brutality. (Source: Zeynep formally interviewed 130 people.)

The Net’s role was to break the censorship, create a new narrative, and to coordinate. She looks at each of these:

The media censorship was incredible. CNN Turkey showed a soccer match as protestors were being chased down the city’s main street. Protestors used Twitter in part because there were too many family members on Facebook. “Ironically, Twitter became more essential because it was more public.” Twitter’s blue bird became the symbol of freedom, in part because people trusted Twitter not to turn over names. Also: lots of penguins.

Real-time coordination: Overall, the Net worked. People coordinated in real time via Twitter. Local businesses turned on open Wifi. People would text to others who then tweeted.

People learned new literacies, especially who to trust. One Twitter stream only tweeted citizen journalism if it came with a photo, to increase credibility.

Counter narrative: Very youth and humor oriented. People came because it was a great place to be, even with the tear gassings. People felt fairly confident that they wouldn’t get shot at, similar to Western Europe or the US.

Leadership: There were 130 organizations, but no central leadership. Much of it was ad hoc, which worked because of social media.

After a few weeks, the protest was brutally dispersed, and then it moved to local parks and neighborhoods. When it broke up, the govt mostly decided to treat the protestors the way GW Bush treated the anti-Iraq War protests: not as a threat, but more like merely a focus group.

Capacity building: Look at capacity, not outcomes. E.g., look at literacy, not GDP (Amatyra Sen). Internet’s capacity-building renders other forms of capacity-building less useful.

The online and offline are one ecology. (She’s looking here at post-citizen protests, i.e., protests were the participants are already recognized as citizens).

The Net lowers the barriers for the resources necessary for protest. No one planned the Gezi protests. They just arose.

So what do protest do? They grab attention, promote social interaction, reveal info, and signal capacity. Her thesis: Internet protests don’t signal the same way as pre-Internet. The Net gains attention without media mediating. Media dependency brings distortion, censorship, and counter propaganda — but also dominance, focus, and singular narrative. Media attention pre-Net often signaled elite dissent. With the Net, movements can get attention on their own terms, but can’t get a singular or dominant narrative. “Since there is no single elite voice, there is no reliable way to signal elite dissent.” Now you can’t get away from polarized narratives.

For social interaction capacity, it’s a big win for the movements. It’s much easier to find people like you on the Net. “The Internet is a homophily machine.” Unfortunately, this doesn’t work just for the movements you like. e.g., the anti-vaccine movement. It’s a win for social movements, but there will be many more movements.

Info revelation. Pluralistic ignorance = you think you’re the only one who is thinking something. The Net gets us past that, e.g., Facebook pages. But, then there are bandwagon/cascade effects.

Signaling: Protests as “stotting.” (“Stotting” = animals jumping up in the bush.) One explanation: it signals how strong you are and thus how fast you can run. Before the Net, because there wasn’t an easy way to organize, if you got a million people to DC, you’re signaling that you have an infrastructural capacity far beyond those million. Now, getting lots of people in the street doesn’t signal the threats that modern govts care about. Even when there are costs, those costs don’t signal the capacity to hurt the govt in ways the govt cares about. So, slacktivism is a bad argument; it’s not the cost of typing that’s being signalled.

Network internalities for social media-fueled protests are weaker. The Left doesn’t celebrate building network internalities because the Left sidesteps important tensions (leadership, representation, delegation). “Side stepping those tensions means that after the street protests, things are more unclear for the Left.” The Left is unable to negotiate, which is why so many movements are stuck at no. The Net allows them to sidestep developing ways to negotiate, etc. The Right, on the other hand (e;g., Tea Party) is comfortable challenging primaries.

To sum up: Look at the building of capacities, not how many people show up. This explains why there’s a repeated cycle where the protests are unable to engage in effective negotiation, representation, pressure, and delegation.

Q&A

Q: What other than Twitter is being used?

A: In Gezi, people knew how to post to Twitter by texting. And Twitter gained the users’ trust. Facebook was important for longer conversations. People collected photos on Tumblr. A lot of blogging, etc. But Twitter was how protesters talked with one another. Turkey isn’t a client state and didn’t need to appeal to America. And hashtags were dropped, so analytics miss just how big it was.

Q: [me] Is the Left stuck forever not being able to get past protests to actual change?

A: In Google Egypt Wael Ghonim was identified as a leader, and he was picked up for questioning. But he couldn’t have coerced a change even if he’d wanted to. I’m not saying this is great. At Gezi, the govt said “Let’s negotiate.” But who do you send? They sent people from the traditional NGOs, but they had no representational capacity. They listend to the Prime Minister. But they weren’t empowered to negotiate. The govt was genuinely frustrated that they couldn’t find a negotiating partner. So after the negotiations, there were some demands, they came back to the park. It’s 3 or 4am. They’re trying to explain what happened. People were confused. There was no way to deal with it. The next day, the protestors formed little forums, but how do you decide which to listen to? Some people were ready to accept it an go. People wanted consensus. But consensus has meant “a lot of social pressure.” That doesn’t work in the modern city. So where do we go with this? It can’t just be technology. There has to be a recognition among Left movements that if you can’t ever delegate or negotiate, then you’re stuck at No. The Right isn’t like this. The Right is using social media to make really significant strides. They’ve blocked the President’s agenda. They’re getting elected in Europe. They Left is unable to get together enough to address the 30-40% unemplyment in Spain. The big visible protests are Left wing. The big visible gains are Right wing.

Q: You said there were about 150 social groups involved in the movement. What was the relation between how they organize this protest and …?

The 150 groups didn’t represent the people on the ground. The groups formed the leadership because they were there, but people on the ground didn’t think of themselves as being there as members of those groups. The traditional NGOs had no capacity to lead, and didn’t understand that.

Q: I was a protestor in Ankara. I was tear gassed three times. Tastes good. How can we orient this approach to be an alternative to the traditional opposition structure? The classic opposition parties in Turkey do not represent the young people, the democratic-based people.

A: We have a huge crisis in opposition representation. The classic opposition parties do not represent the young generation. The young are big on pluralism, for example. There’s no party that represents the live-and-let-live ethic among the protestors. E.g., the young have no polarization around the head scarf issue: “They should if they want to, and not if they don’t want to.” That’s not represented in Parliament. The electoral system blocks the formation of new parties because of the 10% barrier. But, also, the young have a cultural allergy to representation because in traditional politics they see corruption, not representation.

Q: But there’s a trend in the Turkish community to do something. We have to find an alternative.

A: What motivates the existing govt is people losing office.

Q: How many companies offer Internet facilities in Turkey?

A: The backbone goes through one and then it’s sold to companies that can sell access. Great for surveillance. But it’s not the same concern as elsewhere, which is why people felt safe tweeting. Turkey is probably more wired than the US, which isn’t saying so much. Smartphones are necessary just to coordinate meeting up. Much lateness.

Q: In India, we have two successful models. The protests against the rape case were done through FB. An anti-corruption movement was able to organize millions of people throughout the country. But how do you coalesce these energies, give it a shape? But a word of caution: Panic about people from the northeast of India spread throughout the country thanks to social media, leading to killings.

The biggest case of non-state terrorism happened in Pakistan because of a video. Is the Internet good or bad? Yes.

Q: Is protest never effective?

A: Numbers still matter. But it depends on what it’s signaling, which also depends on context. If it signals than we’re here and we’re going to challenge you in your weak point, then yes…

Be the first to comment »

Next Page »