Joho the Blog » open source

February 26, 2015

[liveblog] Data & Technology in Government

I’m at a discussion at the Harvard Kennedy School listening to an awesome panel of Obama administration technologists. Part of the importance of this is that students at the Kennedy School are agitating for a much strong technology component to their education on the grounds that these days policy makers need to be deeply cognizant of the possibilities technology offers, and of the culture of our new technology development environment. Tomorrow there is an afternoon of discussions sponsored by the student-led Technology for Change group. I believe that tonight’s panel is a coincidence, but it is extraordinarily well-timed.

Here are the participants:

  • Aneesh Chopra, the first federal CTO (and a current Shorenstein Center fellow)

  • Todd Park, White House Technology Advisory

  • DJ Patil, the first US Chief Data Science, five days into his tenure

  • Lynn Overmann – Deputy Chief Data Officer, US Dept. of Commerce

  • Nick Sinai – former US Deputy Chief Technology Officer (and a current Shorenstein Center fellow)


NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. PARTICULARLY LOOSE PARAPHRASING even when within quotes; these are geeks speaking quickly.
You are warned, people.

Todd Park: I’m now deeply involved in recruiting. The fundamental rule: “If you get the best people, you win.” E.g., the US Digital Service: “A network of elite technology development teams.” They want to address problems like improving veteran’s care, helping immigrants, etc. “If you go to the best talent in the country and ask them to serve, they will,” he says, pointing to DJ and Lynn.

DJ Patil: We’re building on the work of giants. I think of this as “mass times velocity.” The velocity is the support of the President who deeply believes in open data and technology. But we need more mass, more people. “The opportunity to have real world impact is massive.” Only a government could assemble such a talented set of people. And when people already in the govt are given the opportunity to act and grow, you get awesome results. Data scientists are force multipliers.

Lynn Overmann: “I’m a serial public servant.” She was a public defender at first. “There is literally no serious problem you’re concerned about that you can’t tackle from within the federal government.” The Commerce Dept. has huge amounts of data and needs help unlocking it. [In a previous session, Lynn explained that Commerce offers almost no public-facing servies except gathering and releasing data.]

Nick Sinai: Todd, you were the brains behind the Presidential Innovation Fellows Program

Todd: The government is not a lean start up but that approach applied to may problems work much better than if you apply traditional the waterfall approach to computing. Round 1 went well. In Round 2, they brought in about 40 people. There was a subset of the Round 2 who found the program “addictive.” So the Whitehouse used 18F, a digital consulting service provided by the GSA. Demand has now gone off the chart for these new style of consultants. Some of those folks then helped grow the new US Digital Sesrvice. It all started with the Innovation Fellows and grew organically. “The more people we attract people who are amazing into government, the more we energize amazing people already in government, the more air cover we give them” the more awesomeness there will be. Let them create results at 10x what anyone expected. “That methodology is the only replicable, reliable way to change government at scale, at speed, in a way that’s permanent.” “I can’t tell you how much fun this is.”

DJ Patil: My first encounters with the CIOs of existing agencies and departments have been amazing. They’re so open, so eager for disruption.

Aneesh Chopra: The line between public and private sector is becoming very porous. That means that the products of the teams being described are a new form of information that in the hands of entrepreneurs and innovators can be transformative. E.g., Uber wanted their drivers to make better healthcare choices. There’s now a hose of data about the healthcare signups. A startup — Stride Health — took that hose and customized it for Uber drivers; maybe drivers want better back care options. There’s an increasing portfolio of institutions extending these services. A handshake makes more and more of this datea interoperable, and there’s a hand off to entrepreneurs and innovators. “They may not be stamped .gov” but they’ll powered by data from the govt.

Nick: We have an opportunity to do smart wholesaling of data, as well as retailing it: Great services, but also enabling non-governmental groups to build great end-user services.

Lynn: At Commerce we’re trying to do Open Data 2.0. How do we get our data experts out into the world to talk with users ? How do we share data better? How do we create partnerships with the public sector? E.g., Uber shared its data on traffic patterns with the city of Boston.

Lynn: In the departments Todd has led, he has worked on the gender balance. Women were in the majority by the end of his HHS appointment. [I couldn’t hear all of this.]

Todd: I’ve learned that the more diverse the team is, the better the team is. We made it a real priority for the US Digital Service to have a team that looks like America. It’s also our hope that we’ll be minting people who become superstars in the tech world and will encourage more youths to enter STEM.

Aneesh: There were a few places we thought we could have done better. 1. Rethinking the role and nature of the infrastructure. Human capital is the infrastructure for the digital economy. 2. We make rules of the road — e.g., Net Neutrality today — that give people a more fair shot to compete. There are foundational investments to be made in the infrastructure and creating rules of the road. That’s part of how we affect policy.

Nick: What about the President’s new precision medicine initiative?

Todd: It’s a new way of thinking about how you get medical service. Increasingly Web sites provide tailored experiences. Why not with science? Should your aspirin dose be the same for someone with a different genetics, exposed to different things in your environment, etc.? Where it gets really phenomenal: The cost of genetic sequencing is dropping quickly. And tons of data are coming from sensors (e.g., FitBit). How do you start getting a handle on that to start getting better treatment? Another side of it: Bioinfomatics has been amazing at understanding genes. Combine that with clinical knowledge and we can begin to see that maybe that people who live near docks with diesel fumes have particular symptoms. We’ll be able to provide cohorts for test studies that look like America.

Nick: Aneesh and Todd, you both quote Joy’s Law: Most of the smartest people in the world work for someone else.

Aneesh: In many ways, the lessons learned from the innovation philosophy have had great effect in the public sector. The CEO of P&G said 50% of ideas will come from outside of P&G. This liberated him to find innovations in the military that resulted in $1B in cash flow for P&G. Also, we’ve learned from platform effects and what the team at Facebook has done. Sheryl Sandberg: There are 3,000 developers at FB, but a query at Google found 35,000 people with the title “FB developer,” because other companies were using the FB platform.

Todd: It’s important to remember Joy’s Law, and the more you can get those people in the world to care about what you do, the more successful you’ll be. I was asked what I would do with the vast amount of data that the govt has. My first thought was to build some services. But about 17 seconds later I realized that’s entirely the wrong approach. Rather, open it up in machine readable form. We invited four innovators into a room. At first they were highly skeptical. But then we showed them the data, and they got excited. Ninety days later we had a health care datapalooza, and it caught fire. Data owners were there who thought that opening up their data could only result in terribleness. At the end of the datapalooza they flipped. Within two years, the Health Datapalooza became a 2,000 person event, with thousands of people who couldn’t get tickets. Hundreds of new applications that could help individuals, hospitals, healthcare providers were created. But you have to have the humility to acknowledge that you don’t know the answer. And you have to embrace the principle that the answer is likely to come from someone who aren’t you. That’s the recipe for awesomeness to be released.

Aneesh: When Secty Sibelius saw the very first presentation, her jaw dropped. The question was what are the worst communities in the America for obesity and who can they talk to about improving it. In seven minutes they had an answer. She said that when she was Governor, it would have taken her staff seven months to come up with that answer.

Q: [a self-identified Republican technologist] President Obama got the right team together. What you do is awesome. How can we make sure what you’ve built stays a permanent part of the government?

Aneesh: Eric Cantor was doing much the same in Congress. These ideas of opening up data and engaging entrepreneurs, lean startups, open innovation have been genuinely bipartisan.

Todd: Mike Bracken from the UK Digital service says: The strategy is delivery. What will change govt is a growing set of precedents about how govt really should work. I could write an essay, but it’s more effective if I point to datapalooza and show the apps that were written for free. We have to create more and more examples. These examples are done in partnership with career civil servants who are now empowered to kick butt.

DJ: We can’t meet the demand for data scientists. Every agency needs them. We have to not only train those people up, but also slot them into the whole stack. A large part of our effort will be how to train them, find them great homes at work, and give them ways to progress.

Nick: It’s really hard to roll back transparency. There are constituencies for it, whether it’s accountability orgs, the press, etc.

Lynn: Civil servants are the most mission driven people I’ve met. They won’t stop.

Q: Everyone has talked about the need for common approaches. We need identities that are confidential and interoperable. I see lots of activities, but not a plan. You could do a moonshot here in the time you have left. It’d be a key part of the infrastructure.

Aneesh: When the precision medical provision was launched, a critical provision was that they’ll use every regulatory tool they have to connect consumers to their own data. In 2010 there was a report recommending that we move to healthcare APIs. This led to a privately funded initiative called Project Argonaut. Two days ago we held a discussion here at Harvard and got commitments for public-private efforts to create an open source solution in healthcare. Under Nick, the same went on for connecting consumers to their energy info. [I couldn’t capture all this. I’m not sure the above is right. And Aneesh was clear that he was speaking “as an outsider.”]

DJ: If you check the update to the Podesta Big Data report, it outlines the privacy aspects that we’ll be pushing on. Energy is going into these issues. These are thorny problems.

Q: Cybersecurity has become a high profile issue. How is the govt helping the private sector?

Aneesh: Early on the President offered a framework for a private-public partnership for recognigizing digital fingerprints, etc. This was the subject of a bipartisan effort. Healthcare has uniform data-breach standards. (The most common cause of breaches: bad passwords.) We need an act of Congress to [he went too fast … sorry].

Lynn: Cybersecurity requires an international framework for privacy and data security. That’s a major challenge.

Q: You talked about the importance of STEM. Students in astronomy and astrophysicists worry about getting jobs. What can I say to them?

DJ: I was one of those people. I lot of people I went to school with went on to Wall Street. If you look at the programs that train data scientists, the ones who are super successful in it are people who worked with a lot of messy data: astrophysicists, oceanographers, etc. They’re used to the ambiguity that the data starts with. But there’s a difference in the vocabulary so it’s hard for people to hit the ground running. With 4-6 weeks of training, these people crush it. Tell your students that there are great opportunities and they shouldn’t be dissuaded by having to pound the pavement and knock on doors. Tell them that they have the ability to be game changers.

Q: How many of us are from the college? [surprisingly few hands go up] Your msg about joining the govt sounds like it’s tailored for young professional, not for students. The students I know talk about working for Google or FB, but not for the govt.

Todd: You’re right. The US Digital Service people are young professionals who have had some experience. We will get to recruiting in college. We just haven’t gotten there yet.

Lynn: If you’re interested in really hard problems and having a direct impact on people’s lives, govt service is the best thing you can do.

Q: When you hire young tech people, what skills do they typically not have that they need?

Lynn: Problem solving. Understanding the problems and having the tech skills to solve them. Understanding how people are navigating our systems now and asking how we can leverage tech to make that process much much easier.

DJ: In Sillicon Valley, we’re training people via internships, teaching them what they don’t learn from an academic environment. We have to figure out how govt can do this, and how to develop the groups that can move you forward when you don’t know how to do something.

Aneesh: There is a mindset of product development, which is a muscle that we haven’t worked enough in the policy arena. Policy makers too often specify what goal they want and allocate money for it. But they don’t think about the product that would achieve that goal. (Nice shout out to Karim Lakhani. “He’s in the mind set.”)

Q: [leaders of the Kennedy School Tech for Change] Tech for Change has met with administrators, surveyed students, etc. Students care about this. There’s a summit tomorrow. [I’m going!] What are the three most important things a policy school could do to train students for this new ecosystem. How can HKS be the best in this field?

DJ: Arts and humanities, ethics, and humility.

Todd: One expression of humility is to learn the basics of lean startup innovation. These principles apply broadly

DJ: There’s nothing more humbling than putting your first product out there and watching what people say on Twitter.

Lynn: We should be moving to a world in which technology and policy aren’t separate. It’s a problem when the technologists are not at the table. E.g., we need to be able to track the data we need to measure the results of programs. This is not a separate thing. This is a critical thing that everyone in the school should learn about.

Todd: It’s encouraging that the geeks are being invited into the rooms, even into rooms where no one can imagine why tech would be possibly relevant. But that’s a short term hack. The whole idea that policy makers don’t need to know about tech is incredibly dangerous. Just like policy makers need a basic understanding of economics; they don’t have to be economists.If you don’t have that tech knowledge, you don’t graduate. There will be a direct correlation between the geek quotient and the efficiency of policy.

Nick: Panel, whats your quick actionable request of the Harvard JFK community?

Lynn: We need to make our laws easier to understand.

Todd: If you are an incredibly gifted, patriortic, high EQ designer, dev, devops, data scientists, or you know someone who is, go to whitehouse.gov/usds where you can learn about the Digital Service and apply to join this amazing band.

DJ: Step up by stepping in. And that doesn’t have to be at the federal level. Share ideas. Contribute. Help rally people to the cause.

1 Comment »

April 25, 2014

[nextweb] The Open Source Bank of Brewster

I’m at the Next Web conference in Amsterdam. A large cavern is full of entrepreneurs and Web marketing folks, mainly young. (From my end of the bell curve, most crowds are young.) 2,500 attendees. The pening music is overwhelming loud; I can feel the bass as extra beat in my heart, which from my end of the bell curve is not a good feeling. But the message is of Web empowerment, so I’ll stop my whinging.

Boris Veldhuijzen van Zanten recaps the conference’s 30-hour hackathon. 28 apps. One plays music the tempo of which is based upon how fast you’re driving.

First up is Brewster Kahle [twitter: brewster_kahle], founder of the Internet Archive. [I am a huge Brewster fan, of course.]

Brewster 2011

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Brewster begins by saying that the tech world is in a position to redefine how the economy works.

We are now in position to talk about all of things. We can talk about all species, or all books, etc. Can we make universal access to all knowledge? “That’s the Internet dream I signed on for.” A lot of material isn’t on the Internet yet. Internet Archive is a non-profit “but it’s probably the most successful business I’ve run.” IA has all programs for the Apple II, the Atarai, Commodore, etc. IA has 1.5M physical books. “Libraries are starting to throw away books at a velocity.” They’re aiming for 10M books. They have about 1.5M moving images online. “A lot of the issues are working through the rights issues and keeping everyone calm.” 2M auio recordings, mainly live music collections, not CD’s that have been sold. Since 2000 they’ve been recording live tv, 24×7, multiple channels, international. 3m hours of television. They’re making US TV news searcable. “We want to enable everyone to be a Jon Sewart research department.” 3.7M ebooks — 1,500/day. When they digitize a copy that is under copyright, they lend it to one person at a time. “And everyone’s stayed calm.” Brewster thinks 20th century wbooks will never be widely available. And 400B pages available through the Wayback Macine.

So for knowledge, “We’re getting there.”

“We have an opportunity to build on earlier ideas in the software area to build societies that work better.” E.g., the 0.1% in the US sees its wealth grows but it’s flat for everyone else. Our political and economic systems aren’t working for most people. So, we have to “invent around it.” We have “over-propertized” (via Pam Samuelson). National parks pull back from this. The Nature Conservancy is a private effort to protect lande from over-propertization. The NC has more acres than the National Park system.

Brewster wants to show us how to build on free and open software. Brewster worked with Richard Stallman on the LISP Machine. “People didn’t even sign code. That was considered arrogant.” In 1976 Congress made copyright opt out rather than opt in: everything written became copyrighted for life + 50. “These community projects suddenly became property.” MIT therefore sold the LISP Machine to Symbolics, forking the code. Stahlman tried to keep the open code feature-compatible, but it couldn’t be done. Instead, he created the Free Software GNU system. It was a community license, a distributed system that anyone could participate in just be declaring their code to be free software. “I don’t think has happened before. It’s building law structure based on licenses. It’s licenses rather than law.”

It was a huge win, but where do we go from there? Corporate fanaticism about patents, copyright, etc., locked down everything. Open Source doesn’t work well there. We ended up with high tech non-profits supporting the new sharing infrastructure. The first were about administrating free software: E.g., Free Software Foundation, Linux Foundation, LibreOffice, Apache. Then there were advocacy organizations, e.g., EFF. Now we’re seeing these high=tech non=profits going operational, e.g., Wikipedia ($50M), Mozilla ($300M), Internet Archive ($12M), PLoS ($45M). This model works. They give away their product, and they use a community structure under 501c(3) so that it can’t be bought.

This works. They’ve lasted for more than 20 years, wherars even successful tech companies get mashed and mangled if they last 20 years. So, can we build a free and open ecosystem that work better than the current one? Can we define new rules within it?

At Internet ARchive, the $12M goes largely to people. The people at IA spend most of their salaries on housing, up to 60%. Housing costs so much because of debt: 2/3s of the rent you pay goes to pay off the mortgage of the owner. So, how can we make debt-free housing? Then IA wouldn’t have to raise as much money. So, they’ve made a non-proift that owns an apartment building to provide affordable housing for non-profit workers. The housing has a community license so it the building can’t be sold again. “It pulls it out of the market, like stamping software as Open Source.”

Now he’s trying it for banking. About 40% of profits in corporations in the US goes to financial services. So, they built the Internet Credit Union, a non-profit credit union. They opened bitcoins and were immediately threatened by the government. The crdit union closed those accounts but the government is still auditing them every month. The Internet Credit Union is non-profit, member-run, it helps foundation housing, and its not acquirable.

In sum: We can use communities that last via licenes rater than the law.

Q&A

Boris: If you’re a startup, how do you apply this?

A: Many software companies push hard against the status quo. The days are gone when you can just write code and sell it. You have to hack the system. Think about doing non-profit structures. They’ll trust you more.

2 Comments »

March 5, 2014

[berkman] Karim Lakhani on disclosure policies and innovation

Karim Lakhani of Harvard Business School (and a Berkman associate, and a member of the Harvard Institute for Quantititative Social Science) is giving a talk called “How disclosure policies impact search in open innovation, atopic he has researched with Kevin Boudreau of the London Business School.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Karim has been thinking about how crowds can contribute to innovation for 17 years, since he was at GE. There are two ways this happens:

1. Competitions and contests at which lots of people work on the same problem. Karim has asked who wins and why, motives, how they behave, etc.

2. Communities/Collaboration. E.g., open source software. Here the questions are: Motives? Costs and benefits? Self-selection and joining scripts? Partner selection?

More fundamentally, he wants to know why both of these approaches work so well.

He works with NASA, using topcoder.com: 600K users world wide [pdf]. He also works with Harvard Medical School [more] to see how collaboration works there where (as with Open Source) people choose their collaborators rather than having them chosen top-down.

Karim shows a video about a contest to solve an issue with the International Space Station, having to do with the bending of bars (longerons) in the solar collectors when they are in the shadows. NASA wanted a sophisticated algorithm. (See www.topcoder.com/iss) . It was a two week contest, $30K price. Two thousand signed up for it; 459 submitted solutions. The winners came from around the globe. Many of the solutions replicated or slightly exceeded what NASA had developed with its contractors, but this was done in just two weeks simply for the price of the contest prize.

Karim says he’ll begin by giving us the nutshell version of the paper he will discuss with us today. Innovation systems create incentives to exert innovative effort and encourage the disclosure of knowledge. The timing and the form of the disclosures differentiates systems. E.g., Open Science tends to publish when near done, while Open Source tends to be more iterative. The paper argues that intermediate disclosures (as in open source) dampen incentives and participation, yet lead to higher perrformance. There’s more exploration and experimentation when there’s disclosure only at the end.

Karim’s TL;DR: Disclosure isn’t always helpful for innovation, depending on the conditions.

There is a false debate between closed and open innovation. Rather, what differentiates regimes is when the disclosure occurs, and who has the right to use those disclosures. Intermediate disclosure [i.e., disclosure along the way] can involve a range of outputs. E.g., the Human Genome Project enshrined intermediate disclosure as part of an academic science project; you had to disclose discoveries within 24 hours.

Q: What constitutes disclosure? Would talking with another mathematician at a conference count as disclosure?

A: Yes. It would be intermediate disclosure. But there are many nuances.

Karim says that Allen, Meyer and Nuvolari have shown that historically, intermediate disclosure has been an important source of technological progress. E.g., the Wright brothers were able to invent the airplane because of a vibrant community. [I’m using the term “invent” loosely here.]

How do you encourage continued innovation while enabling early re-use of it? “Greater disclosure requirements will degrade incentives for upstream innovators to undertake risky investment.” (Green & Scotchmer; Bessen & Maskin.) We see compensating mechanisms under regimes of greater disclosure: E.g., priority and citations in academia; signing and authorship in Open Source. You may also attract people who have a sharing ethos; e.g., Linus Torvalds.

Research confirms that the more access your provide, the more reuse and sharing there will be. (Cf. Eric von Hippel.) Platforms encourage reuse of core components. (cf. Boudreau 2010; Rysman and Simcoe 2008) [I am not getting all of Karim’s citations. Not even close.]

Another approach looks at innovation as a problem-solving process. And that entails search. You need to search to find the best solutions in an uncertain space. Sometimes innovators use “novel combinations of existing knowledge” to find the best solutions. So let’s look at the paths by which innovators come up with ideas. There’s a line of research that assumes that the paths are the essential element to understand the innovation process.

Mathematical formulations of this show you want lots of people searching independently. The broader the better for innovation outcomes. But there is a tendency of the researchers to converge on the initially successful paths. These are affected by decisions about when to disclose.

So, Karim and Kevin Boudreau implemented a field experiment. They used TopCoder, offering $6K, to set up a Med School project involving computational biology. The project let them get fine-grained info about what was going on over the two weeks of the contest.

700 people signed up. They matched them on skills and randomized them into three different disclosure treatments. 1. Standard contest format, with a prize at the end of each week. (Submissions were automatically scored, and the first week prizes went to the highest at that time.) 2. Submitted code was instantly posted to a wiki where anyone could use it. 3. In the first week you work without disclosure, but in the second week submissions were posted to the wiki.

For those whose work is disclosed: You can find and see the most successful. You can get money if your code is reused. In the non-disclosure regime you cannot observe solutions and all communications are bared. In both cases, you can see market signals and who the top coders are.

Of the 733 signups from 69 different countries, 122 coders submitted 654 submissions, with 89 different approaches. 44% were professionals; 56% were students. The skewed very young. 98% men. They spent about 10 hours a week, which is typical of Open Source. (There’s evidence that women choose not to participate in contests like this.) The results beat the NIH’s approach to the problem which was developed at great cost over years. “This tells me that across our economy there are lots of low-performing” processes in many institutions. “This works.”

What motivated the participants? Extrinsic motives matter (cash, job market signals) and intrinsic motives do too (fun, etc.). But so do prosocial motives (community belonging, identity). Other research Karim has done shows that there’s no relation between skills and motives. “Remember that in contests most people are losing, so there have to be things other than money driving them.”

Results from the experiment: More disclosure meant lower participation. Also, more disclosure correlated with the hours worked going down. The incentives and efforts are lower when there’s intermediate disclosure. “This is contrary to my expectations,”Karim says.

Q: In the intermediate disclosure regime is there an incentive to hold your stuff back until the end when no one else can benefit from it?

A: One guy admitted to this, and said he felt bad about it. He won top prize in the second week, but was shamed in the forums.

In the intermediate disclosure regime, you get better performance (i.e., better submission score). In the mixed experiment, performance shot up in the second week once the work of others was available.

They analyzed the ten canonical approaches and had three Ph.D.s tag the submissions with those approaches. The solutions were combinations of those ten techniques.

With no intermediate disclosures, the search patterns are chaotic. With intermedia disclosures, there is more convergence and learning. Intermediate disclosure resulted in 30% fewer different approaches. The no-disclsoure folks were searching in the lower-performance end of the pool. There was more exploration and experimentation in their searches when there was no intermediate disclosure, and more convergence and collaboration when there is.

Increased reuse comes at the cost of incentives. The overall stock of knowledge created is low, although the quality is higher. More convergent behavior comes with intermediate disclosures, which relies on the stock of knowledge available. The fear is that with intermediate disclosure , people will get stuck on local optima — path dependnce is a real risk in intermediate disclosure.

There are comparative advantages of the two systems. Where there is a broad stock of knowledge, intermediate disclosure works best. Plus the diversity of participants may overcome local optima lock-in. Final disclosure [i.e., disclosure only at the end] is useful where there’s broad-based experimentation. “Firms have figured out how to play both sides.” E.g., Apple is closed but also a heavy participant in Open Source.

Q&A

Q: Where did the best solutions come from?

A: From intermediate disclosure. The winner came from there, and then the next five were derivative.

Q: How about with the mixed?

A: The two weeks tracked the results of the final and intermediate disclosure regimes.

Q: [me] How confident are you that this applies outside of this lab?

A: I think it does, but even this platform is selecting on a very elite set of people who are used to competing. One criticism is that we’re using a platform that attracts competitors who are not used to sharing. But rank-order based platforms are endemic throughout society. SATs, law school tests: rank order is endemic in our society. In that sense we can argue that there’s a generalizability here. Even in Wikipedia and Open Source there is status-based ranking.

Q: Can we generalize this to systems where the outputs of innovation aren’t units of code, but, e.g., educational systems or municipal govts?

Q: We study coders because we can evaluate their work. But I think there are generalizations about how to organize a system for innovation, even if the outcome isn’t code. What inputs go into your search processes? How broad do you do?

Q: Does it matter that you have groups that are more or less skilled?

A: We used the Topcoder skill ratings as a control.

Q: The guy who held back results from the Intermediate regime would have won in real life without remorse.

A: Von Hippel’s research says that there are informal norms-based rules that prevent copying. E.g., chefs frown on copying recipes.

Q: How would you reform copyright/patent?

A: I don’t have a good answer. My law professor friends say the law has gone too far to protect incentives. There’s room to pull that back in order to encourage reuse. You can ask why the Genome Project’s Bermuda Rules (pro disclosure) weren’t widely adopted among academics. Academics’ incentives are not set up to encourage automatic posting and sharing.

Q: The Human Genome Project resulted in a splintering that set up a for-profit org that does not disclose. How do you prevent that?

A: You need the right contracts.

This was a very stimulating talk. I am a big fan of Karim and his work.


Afterwards Karim and I chatted briefly about whether the fact that 98% of Topcoder competitors are men raises issues about generalizing the results. Karim pointed to the general pervasiveness of rank-ordered systems like the one at TopCoder. That does suggest that the results are generalizable across many systems in our culture. Of course, there’s a risk that optimizing such systems might result in less innovation (using the same measures) than trying to open those systems up to people averse to them. That is, optimizing for TopCoder-style systems for innovation might create a local optima lock-in. For example, if the site were about preparing fish instead of code, and Japanese chefs somehow didn’t feel comfortable there because of its norms and values, how much could you conclude about optimizing conditions for fish innovation? Whereas, if you changed the conditions, you’d likely get sushi-based innovation that the system otherwise inadvertently optimized against.


[Note: 1. Karim’s point in our after-discussion was purely about the generalizability of the results, not about their desirability. 2. I’m trying to make a narrow point about the value of diversity of ideas for innovation processes, and not otherwise comparing women and Japanese chefs.]

Be the first to comment »

April 10, 2012

CFPB.gov goes open source

The Consumer Financial Protection Bureau — AKA “The Agency Elizabeth Warren Was Born to Lead” — has announced that its software will be open source, with rare exceptions for security, although “… we believe that, in general, hiding source code does not make the software safer”.

The CFPB’s explanation of why it’s going the open source route hits all the right notes: It’s easy to acquire, it keeps its data open, and it lets the agency tap into the enormous libraries of available code. Plus:

Open-source software works because it enables people from around the world to share their contributions with each other. The CFPB has benefited tremendously from other people’s efforts, so it’s only right that we give back to the community by sharing our work with others.

I like it when government talks — and acts! — this way.

1 Comment »

August 4, 2011

Knowledge is the network

I forked yesterday for the first time. I’m pretty thrilled. Not about the few lines of code that I posted. If anyone notices and thinks the feature is a good idea, they’ll re-write my bit from the ground up.* What’s thrilling is seeing this ecology in operation, for the software development ecology is now where the most rapid learning happens on the planet, outside the brains of infants.

Compare how ideas and know-how used to propagate in the software world. It used to be that you worked in a highly collaborative environment, so it was already a site of rapid learning. But the barriers to sharing your work beyond your cube-space were high. You could post to a mailing list or UseNet if you had permission to share your company’s work, you could publish an article, you could give a talk at a conference. Worse, think about how you would learn if you were not working at a software company or attending college: Getting answers to particular questions — the niggling points that hang you up for days — was incredibly frustrating. I remember spending much of a week trying to figure out how to write to a file in Structured BASIC [SBASIC], my first programming language , eventually cold-calling a computer science professor at Boston University who politely could not help me. I spent a lot of time that summer learning how to spell “Aaaaarrrrrggggghhhhh.”

On the other hand, this morning Antonio, who is doing some work for the Library Innovation Lab this summer, poked his head in and pointed us to a jquery-like data visualization library. D3 makes it easy for developers to display data interactively on Web pages (the examples are eye-popping), and the author, mbostock, made it available for free to everyone. So, global software productivity just notched up. A bunch of programs just got easier to use, or more capable, or both. But more than that, if you want to know how to do how mbostock did it, you can read the code. If you want to modify it, you will learn deeply from the code. And if you’re stuck on a problem — whether n00bish or ultra-geeky — Google will very likely find you an answer. If not, you’ll post at StackOverflow or some other site and get an answer that others will also learn from.

The general principles of this rapid-learning ecology are pretty clear.

First, we probably have about the same number of smart people as we did twenty years ago, so what’s making us all smarter is that we’re on a network together.

Second, the network has evolved a culture in which there’s nothing wrong with not knowing. So we ask. In public.

Third, we learn in public.

Fourth, learning need not be private act that occurs between a book and a person, or between a teacher and a student in a classroom. Learning that is done in public also adds to that public.

Fifth, show your work. Without the “show source” button on browsers, the ability to create HTML pages would have been left in the hands of HTML Professionals.

Sixth, sharing is learning is sharing. Holy crap but the increased particularity of our ownership demands about our ideas gets in the way of learning!

Knowledge once was developed among small networks of people. Now knowledge is the network.

 


*I added a couple of features I needed to an excellent open source program that lets you create popups that guide users through an app. The program is called Guiders-JS by Jeff Pickhardt at Optimizely. Thanks, Jeff!)

5 Comments »

June 23, 2011

Interview with Dan Cohen of Zotero

My interview with Dan Cohen about what libraries can learn from Zotero has gone up at the Library Innovation Lab blog Dan’s a really interesting guy, and Zotero is a great app that models openness.

Here’s the complete list of podcasts on the site>

Be the first to comment »

July 1, 2010

Twitter research tool

The Web Ecology Project has released 140Kit, a research tool for tracing tweet paths. From the site: “It’s the final product of the various provisional tools we’ve used to produce our previous reports on the social phenomena of Twitter, and of lead researcher Devin Gaffney’s own work on high throughput humanities.”

  • It enables complete data pulls for a set of users or terms on Twitter, with searches running continuously.

  • The ability to download those data pulls in raw form to use for whatever you please.

  • The ability to stand on the shoulders of giants by mixing and matching existing data pulls to generate entirely new combinations of data and analysis.

  • And the ability to instantaneously generate basic visualizations around the data (term use, inequality of participation, etc).

It’s free to use and open source. So, Devin Gaffney, Ian Pearce, Max Darham, and Max Nanis:
Thanks!

2 Comments »