Joho the Bloginterop Archives - Joho the Blog

March 1, 2016

[berkman] Dries Buytaert

I’m at a Berkman [twitter: BerkmanCenter] lunchtime talk (I’m moderating, actually) where Dries Buytaert is giving a talk about some important changes in the Web.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

He begins by recounting his early days as the inventor of Drupal, in 2001. He’s also the founder of Acquia, one of the fastest growing tech companies in the US. It currently has 750 people working on products and services for Drupal. Drupal is used by about 3% of the billion web sites in the world.

When Drupal started, he felt he “could wrap his arms” around everything going on on the Web. Now that’s impossible, he says. E.g, Google AdWords were just starting, but now AdWords is a $65B business. The mobile Web didn’t exist. Social media didn’t yet exist. Drupal was (and is) Open Source, a concept that most people didn’t understand. “Drupal survived all of these changes in the market because we thought ahead” and then worked with the community.

“The Internet has changed dramatically” in the past decade. Big platforms have emerged. They’re starting to squeeze smaller sites out of the picture. There’s research that shows that many people think that Facebook is the Internet. “How can we save the open Web?,” Dries askes.

What do we mean by the open or closed Web? The closed Web consists of walled gardens. But these walled gardens also do some important good things: bringing millions of people online, helping human rights and liberties, and democratizing the sharing of information. But, their scale is scary . FB has 1.6B active users every month; Apple has over a billion IoS devices. Such behemoths can shape the news. They record data about our behavior, and they won’t stop until they know everything about us.

Dries shows a table of what the different big platforms know about us. “Google probably knows the most about us” because of gMail.

The closed web is winning “because it’s easier to use.” E.g., After Dries moved from Belgium to the US, Facebook and etc. made it much easier to stay in touch with his friends and family.

The open web is characterized by:

  1. Creative freedom — you could create any site you wanted and style it anyway you pleased

  2. Serendipity. That’s still there, but it’s less used. “We just scroll our FB feed and that’s it.”

  3. Control — you owned your own data.

  4. Decentralized — open standards connected the pieces

Closed Web:

  1. Templates dictate your creative license

  2. Algorithms determine what you see

  3. Privacy is in question

  4. Information is siloed

The big platforms are exerting control. E.g., Twitter closed down its open API so it could control the clients that access it. FB launched “Free Basics” that controls which sites you can access. Google lets people purchase results.

There are three major trends we can’t ignore, he says.

First, there’s the “Big Reverse of the Web,” about which Dries has been blogging about. “We’re in a transformational stage of the Web,” flipping it on its head. We used to go to sites and get the information we want. Now information is coming to us. Info, products, and services will come to us at the right time on the right device.”

Second, “Data is eating the world.”

Third, “Rise of the machines.”

For example, “content will find us,” AKA “mobile or contextual information.” If your flight is cancelled, the info available to you at the airport will provide the relevant info, not offer you car rentals for when you arrive. This creates a better user experience, and “user experience always wins.”

Will the Web be open or closed? “It could go either way.” So we should be thinking about how we can build data-driven, user-centric algorithms. “How can we take back control over our data?” “How can we break the silos” and decentralize them while still offering the best user experience. “How do we compete with Google in a decentralized way. Not exactly easy.”

For this, we need more transparency about how data is captured and used, but also how the algorithms work. “We need an FDA for data and algorithms.” (He says he’s not sure about this.) “It would be good if someone could audit these algorithms,” because, for example, Google’s can affect an election. But how to do this? Maybe we need algorithms to audit the algorithms?

Second, we need to protect our data. Perhaps we should “build personal information brokers.” You unbundle FB and Google, put the data in one place, and through APIs give apps access to them. “Some organizations are experimenting with this.”

Third, decentralization and a better user experience. “For the open web to win, we need to be much better to use.” This is where Open Source and open standards come in, for they allow us to build a “layer of tech that enables different apps to communicate, and that makes them very easy to use.” This is very tricky. E.g., how do you make it easy to leave a comment on many different sites without requiring people to log in to each?

It may look almost impossible, but global projects like Drupal can have an impact, Dries says. “We have to try. Today the Web is used by billions of people. Tomorrow by more people.” The Internet of Things will accelerate the Net’s effect. “The Net will change everything, every country, every business, every life.” So, “we have a huge responsibility to build the web that is a great foundation for all these people for decades to come.”

[Because I was moderating the discussion, I couldn’t capture it here. Sorry.]

Be the first to comment »

March 12, 2015

Corrections metadata

It’s certain that this has already been suggested many times, and it’s highly likely it’s been implemented at least several times. But here goes:

Currently the convention for correcting an online mistake is to strikethrough the errant text and then put in the correct text. Showing one’s errors is a wonderful norm, for it honors the links others have made to the piece; it’s at best confusing when you post criticism of someone else’s work, but when the reader goes there the errant remarks have been totally excised. It’s also a visible display of humility.

But strikethrough text is a visual cue of a structural meaning. And it conveys only the fact that the text is wrong, not why it’s wrong.

So, why isn’t there markup for corrections? is the set of simple markup for adding semantics to plain old Web pages. The reader can’t see the markup, but computers can. The major search engines are behind, which means that if you mark up your page with the metadata they’ve specified, the search engines will understand your page better and are likely to up its ranking. (Here’s another post of mine about

So, imagine there were simple markup you could put into your HTML that would let you note that some bit of text is errant, and let you express (in hidden text):

  • When the correction was made

  • Who made it

  • Who suggested the correction, if anyone.

  • When it was made

  • What was wrong with the text

  • A bit of further explanation

The corrected text might include the same sort of information. Plus, you’d want a way to indicate that these two pieces of text refer to one another; you wouldn’t want a computer getting confused about which correction corrects which errant text.

If this became standard, browsers could choose to display errant texts and their corrections however they’d like. Add-ons could be written to let users interact with corrections in different ways. For example, maybe you like seeing strikethroughs but I’d prefer to be able to hover to see the errant text. Maybe we can sign up to be notified of any corrections to an article, but not corrections that are just grammatical. Maybe we want to be able to do research about the frequency and type of corrections across sources, areas, languages, genders…. could drive this through. Unless, of course, it already has.


Be sure to read the comment from Dan Brickley. Dan is deeply involved in (The prior comment is from my former college roommate.)


January 20, 2015

Fargo: an open outliner

Dave Winer loves outlines. I do, too, but Dave loves them More. We know this because Dave’s created the Fargo outliner, and, in the way of software that makes us freer, he’s made it available to us to use for free, without ads or spyware, and supporting the standards and protocols that make our ideas interoperable.

Fargo is simple and straightfoward. You enter text. You indent lines to create structure. You can reorganize and rearrange as you would like. Type CMD-? or CTL-? for help.

Fargo is a deep product. It is backed by a CMS so you can use it as your primary tool for composing and publishing blog posts. (Dave knows a bit about blogging, after all.) It has workgroup tools. You can execute JavaScript code from it. It understands Markdown. You can use it to do presentations. You can create and edit attributes. You can include other files, so your outlines scale. You can includes feeds, so your outlines remain fresh.

Fargo is generative. It supports open standards, and it’s designed to make it easy to let what you’ve written become part of the open Web. It’s written in HTML5 and runs in all modern browsers. Your outlines have URLs so other pages can link to them. Fargo files are saved in the OPML standard so other apps can open them. The files are stored in your Dropbox folder , which puts them in the Cloud but also on your personal device; look in Dropbox/Apps/smallpicture/. You can choose to encrypt your files to protect them from spies. The Concord engine that powers Fargo is Open Source.

Out of the box, Fargo is a heads-down outliner for people who think about what they write in terms of its structure. (I do.) It thus is light on the presentation side: You can’t easily muck about with the styles it uses to present various levels, and there isn’t an embedded way to display graphics, although you can include files that are displayed when the outline is rendered. But because it is a simple product with great depth, you can always go further with it.

And now matter how far you go, you’ll never be locked in.

1 Comment »

February 1, 2014

Linked Data for Libraries: And we’re off!

I’m just out of the first meeting of the three universities participating in a Mellon grant — Cornell, Harvard, and Stanford, with Cornell as the grant instigator and leader — to build, demonstrate, and model using library resources expressed as Linked Data as a tool for researchers, student, teachers, and librarians. (Note that I’m putting all this in my own language, and I was certainly the least knowledgeable person in the room. Don’t get angry at anyone else for my mistakes.)

This first meeting, two days long, was very encouraging indeed: it’s a superb set of people, we are starting out on the same page in terms of values and principles, and we enjoyed working with one another.

The project is named Linked Data for Libraries (LD4L) (minimal home page), although that doesn’t entirely capture it, for the actual beneficiaries of it will not be libraries but scholarly communities taken in their broadest sense. The idea is to help libraries make progress with expressing what they know in Linked Data form so that their communities can find more of it, see more relationships, and contribute more of what the communities learn back into the library. Linked Data is not only good at expressing rich relations, it makes it far easier to update the dataset with relationships that had not been anticipated. This project aims at helping libraries continuously enrich the data they provide, and making it easier for people outside of libraries — including application developers and managers of other Web sites — to connect to that data.

As the grant proposal promised, we will use existing ontologies, adapting them only when necessary. We do expect to be working on an ontology for library usage data of various sorts, an area in which the Harvard Library Innovation Lab has done some work, so that’s very exciting. But overall this is the opposite of an attempt to come up with new ontologies. Thank God. Instead, the focus is on coming up with implementations at all three universities that can serve as learning models, and that demonstrate the value of having interoperable sets of Linked Data across three institutions. We are particularly focused on showing the value of the high-quality resources that libraries provide.

There was a great deal of emphasis in the past two days on partnerships and collaboration. And there was none of the “We’ll show ‘em where they got it wrong, by gum!” attitude that in my experience all too often infects discussions on the pioneering edge of standards. So, I just got to spend two days with brilliant library technologists who are eager to show how a new generation of tech, architecture, and thought can amplify the already immense value of libraries.

There will be more coming about this effort soon. I am obviously not a source for tech info; that will come soon and from elsewhere.


June 21, 2013

[lodlam] Kevin Ford on the state of BIBFRAME

Kevin Ford who is a principle member of the team behind the Library of Congress’ BIBFRAME effort — a modern replacement for the aging MARC standard — gives an update on its status, and addresses a controversy about whether it’s “webby” enough. (I liveblogged a session about this at LODLAM.)


[lodlam] Kitio Fofack on why Linked Data

Kitio Fofack turned to Linked Data when creating a prototype app that aggregated researcher events. He explains why.

Be the first to comment »

March 28, 2013

[annotation][2b2k] Critique^it

Ashley Bradford of Critique-It describes his company’s way of keeping review and feedback engaging.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

To what extent can and should we allow classroom feedback to be available in the public sphere? The classroom is a type of Habermasian civic society. Owning one’s discourse in that environment is critical. It has to feel human if students are to learn.

So, you can embed text, audio, and video feedback in documents, video and images. It translates docs into HTML. To make the feedback feel human, it uses slightly stamps. You can also type in comments, marking them as neutral, positive, or critique. A “critique panel” follows you through the doc as you read it, so you don’t have to scroll around. It rolls up comments and stats for the student or the faculty.

It works the same in different doc types, including Powerpoint, images, and video.

Critiques can be shared among groups. Groups can be arbitrarily defined.

It uses HTML 5. It’s written in Javascript, PHP, and uses Mysql.

“We’re starting with an environment. We’re building out tools.” Ashley aims for Critique^It to feel very human.


[annotation][2b2k] Mediathread

Jonah Bossewich and Mark Philipsonfrom Columbia University talk about Mediathread, an open source project that makes it easy to annotate various digital sources. It’s used in many courses at Columbi, as well as around the world.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

It comes from Columbia’s Center for New Media Teaching and Learning. It began with Vital, a video library tool. It let students clip and save portions of videos, and comment on them. Mediathread connects annotations to sources by bookmarking, via a bookmarklet that interoperates with a variety of collections. The bookmarklet scrapes the metadata because “We couldn’t wait for the standards to be developed.” Once an item is in Mediathread, it embeds the metadata as well.

It has always been conceived of a “small-group sharing and collaboration space.” It’s designed for classes. You can only see the annotations by people in your class. It does item-level annotation, as well as regions.

Mediathread connects assignments and responses, as well as other workflows. [He’s talking quickly :)]

Mediathread’s bookmarklet approach requires it to have to accommodate the particularities of sites. They are aiming at making the annotations interoperable in standard forms.

Be the first to comment »

[annotation][2b2k] Phil Desenne on Harvard annotation tools

Phil Desenne begins with a brief history of annotation tools at Harvard. There are a lot, for annotating from everything to texts to scrolls to music scores to video. Most of them are collaborative tools. The collaborative tool has gone from Adobe AIR to Harvard iSites, to open source HTML 5. “It’s been a wonderful experience.” It’s been picked up by groups in Mexico, South America and Europe.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Phil works on edX. “We’re beginning to introduce annotation into edX.” It’s being used to encourage close reading. “It’s the beginning of a new way of thinking about teaching and assessing students.” Students tag the text, which “is the beginning of a semantic tagging system…Eventually we want to create a semantic ontology.”

What are the implications for the “MOOC Generation”? MOOC students are out finding information anywhere they can. They stick within a single learning management system (LMS). LMS’s usually have commentary tools “but none of them talk with one another . Even within the same LMS you don’t have cross-referencing of the content.” We should have an interoperable layer that rides on top of the LMS’s.

Within edX, there are discussions within classes, courses, tutorials, etc. These should be aggregated so that the conversations can reach across the entire space, and, of course, outside of it. edX is now working on annotation systems that will do this. E.g., imagine being able to discuss a particular image or fragments of videos, and being able to insert images into streams of commentary. Plus analytics of these interations. Heatmaps of activity. And a student should be able to aggregate all her notes, journal-like, so they can be exported, saved, and commented on, “We’re talking about a persistent annotation layer with API access.” “We want to go there.”

For this we need stable repositories. They’ll use URNs.

Be the first to comment »

[annotation][2b2k] Paolo Ciccarese on the Domeo annotation platform

Paolo Ciccarese begins by reminding us just how vast the scientific literature is. We can’t possibly read everything we should. But “science is social” so we rely on each other, and build on each other’s work. “Everything we do now is connected.”

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Today’s media do provide links, but not enough. Things are so deeply linked. “How do we keep track of it?” How do we communicate with others so that when they read the same paper they get a little bit of our mental model, and see why we found the article interesting?

Paolo’s project — Domeo [twitter:DomeoTool] — is a web app for “producing, browsing, and sharing manual and semi-automatic (structure and unstructured) annotations, using open standards. Domeo shows you an article and lets you annotate fragments. You can attach a tag or an unstructured comment. The tag can be defined by the user or by a defined ontology. Domeo doesn’t care which ontologies you use, which means you could use it for annotating recipes as well as science articles.

Domeo also enables discussions; it has a threaded msg facility. You can also run text mining and entity recognition systems (Calais, etc.) that automatically annotates the work with those words, which helps with search, understanding, and curation. This too can be a social process. Domeo lets you keep the annotation private or share it with colleagues, groups, communities, or the Web. Also, Domeo can be extended. In one example, it produces information about experiments that can be put into a database where it can be searched and linked up with other experiments and articles. Another example: “hypothesis management” lets readers add metadata to pick out the assertions and the evidence. (It uses RDF) You can visualize the network of knowledge.

It supports open APIs for integrating with other systems., including into the Neuroscience Information Framework and Drupal. “Domeo is a platform.” It aims at supporting rich source, and will add the ability to follow authors and topics, etc., and enabling mashups.

Be the first to comment »

Next Page »

Creative Commons License
Joho the Blog by David Weinberger is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

Creative Commons license: Share it freely, but attribute it to me, and don't use it commercially without my permission.

Joho the Blog gratefully uses WordPress blogging software.
Thank you, WordPress!