Joho the BlogJoho the Blog - Let's just see what happens

April 29, 2019

Forbes on 4 lessons from Everyday Chaos

Joe McKendrick at Forbes has posted a concise and thoughtful column about
Everyday Chaos, including four rules to guide your expectations about machine learning.

It’s great to see a pre-publication post so on track about what the book says and how it applies to business.

1 Comment »

April 16, 2019

First chapter of Everyday Chaos on Medium…and more!

Well, actually less. And more. Allow me to explain:

The first half of the first chapter of Everyday Chaos is now available at Medium. (An Editor’s Choice, no less!)

You can also read the first half of the chapter on how our model of models is changing at the Everyday Chaos site (Direct link: pdf).

At that site you’ll also find a fifteen minute video (Direct link: video) in which I attempt to explain why I wrote the book and what it’s about.

Or, you can just skip right to the pre-order button (Direct link: Amazon or IndieBound) :)

Be the first to comment »

April 15, 2019

Fountain Pens: The Tool, the Minor Fetish

In response to a tweet asking writers what they write out longhand, I replied that if I’m particularly at sea, I’ll write out an outline, usually with lots of looping arrows, on a pad. But only with a fountain pen. Ballpoints don’t work.

My old bloggy friend AKMA wondered how he’d known me so long without knowing that I’m a fountain pen guy. The truth is that I’ve only recently become one. I’ve liked them at various times over the course my life, but only about four years ago did I integrate fountain pens into my personality.

It happened because I bought a $20 Lamy Safari on impulse in a stationery store. From there I got some single-digit Chinese fountain pens. Then, when I made some money on a writing contract, I treated myself to a $120 Lamy 2000, a lifetime pen. It’s pretty much perfect, from the classic 1960s design to the way the ink flows onto paper just wet enough and with enough scratchiness to feel like you’re on a small creek splashing over stones as it carves out words.

I have recently purchased a TWSBI ECO for $30. It has replaced my Safari as my daily pen. It’s lovely to write with, holds a lot of ink, and feels slightly sturdier than the Safari. Recommended.

Even though my handwriting is horrendous, I look forward to opportunities to write with these pens. But I avoid writing anything I’ll then have to transcribe because transcribing is so tedious. I do harbor a romantic notion of writing fiction longhand with a fountain pen on pads of Ampad “golden fibre.” Given that my fiction is worse than my handwriting, we can only hope that this notion itself remains a fiction.

So much of my writing is undoing, Penelope-like, the words I wove the day before that I am not tempted even a little to switch from word processors when the words and their order are the object. But when the words are mere vehicles, my thinking is helped — I believe — by a pen that drags its feet in the dirt.


March 24, 2019

Automating our hardest things: Machine Learning writes

In 1948 when Claude Shannon was inventing information science [pdf] (and, I’d say, information itself), he took as an explanatory example a simple algorithm for predicting the element of a sentence. For example, treating each letter as equiprobable, he came up with sentences such as:


If you instead use the average frequency of each letter, you instead come up with sentences that seem more language-like:


At least that one has a reasonable number of vowels.

If you then consider the frequency of letters following other letters—U follows a Q far more frequently than X does—you are practically writing nonsense Latin:


Looking not at pairs of letters but triplets Shannon got:


Then Shannon changes his units from triplets of letters to triplets of words, and gets:


Pretty good! But still gibberish.

Now jump ahead seventy years and try to figure out which pieces of the following story were written by humans and which were generated by a computer:

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science.

Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.

Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow.

Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez.

“Pérez and his friends were astonished to see the unicorn herd”Pérez and his friends were astonished to see the unicorn herd. These creatures could be seen from the air without having to move too much to see them – they were so close they could touch their horns.

While examining these bizarre creatures the scientists discovered that the creatures also spoke some fairly regular English. Pérez stated, “We can see, for example, that they have a common ‘language,’ something like a dialect or dialectic.”

Dr. Pérez believes that the unicorns may have originated in Argentina, where the animals were believed to be descendants of a lost race of people who lived there before the arrival of humans in those parts of South America.

While their origins are still unclear, some believe that perhaps the creatures were created when a human and a unicorn met each other in a time before human civilization. According to Pérez, “In South America, such incidents seem to be quite common.”

However, Pérez also pointed out that it is likely that the only way of knowing for sure if unicorns are indeed the descendants of a lost alien race is through DNA. “But they seem to be able to communicate in English quite well, which I believe is a sign of evolution, or at least a change in social organization,” said the scientist.

The answer: The first paragraph was written by a human being. The rest was generated by a machine learning system trained on a huge body of text. You can read about it in a fascinating article (pdf of the research paper) by its creators at OpenAI. (Those creators are: Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever.)

There are two key differences between this approach and Shannon’s.

First, the new approach analyzed a very large body of documents from the Web. It ingested 45 million pages linked in Reddit comments that got more than three upvotes. After removing duplicates and some other cleanup, the data set was reduced to 8 million Web pages. That is a lot of pages. Of course the use of Reddit, or any one site, can bias the dataset. But one of the aims was to compare this new, huge, dataset to the results from existing sets of text-based data. For that reason, the developers also removed Wikipedia pages from the mix since so many existing datasets rely on those pages, which would smudge the comparisons.

(By the way, a quick google search for any page from before December 2018 mentioning both “Jorge Pérez” and “University of La Paz” turned up nothing. “The AI is constructing, not copy-pasting.”The AI is constructing, not copy-pasting.)

The second distinction from Shannon’s method: the developers used machine learning (ML) to create a neural network, rather than relying on a table of frequencies of words in triplet sequences. ML creates a far, far more complex model that can assess the probability of the next word based on the entire context of its prior uses.

The results can be astounding. While the developers freely acknowledge that the examples they feature are somewhat cherry-picked, they say:

When prompted with topics that are highly represented in the data (Brexit, Miley Cyrus, Lord of the Rings, and so on), it seems to be capable of generating reasonable samples about 50% of the time. The opposite is also true: on highly technical or esoteric types of content, the model can perform poorly.

There are obviously things to worry about as this technology advances. For example, fake news could become the Earth’s most abundant resource. For fear of its abuse, its developers are not releasing the full dataset or model weights. Good!

Nevertheless, the possibilities for research are amazing. And, perhaps most important in the longterm, one by one the human capabilities that we take as unique and distinctive are being shown to be replicable without an engine powered by a miracle.

That may be a false conclusion. Human speech does not consist simply of the utterances we make but the complex intentional and social systems in which those utterances are more than just flavored wind. But ML intends nothing and appreciates nothing. “Nothing matters to ML.”Nothing matters to ML. Nevertheless, knowing that sufficient silicon can duplicate the human miracle should shake our confidence in our species’ special place in the order of things.

(FWIW, my personal theology says that when human specialness is taken as conferring special privilege, any blow to it is a good thing. When that specialness is taken as placing special obligations on us, then at its very worst it’s a helpful illusion.)


March 8, 2019

Keep JavaScript dumb

I’ve been a hobbyist programmer since I got my first computer in 1984 or so. I greatly enjoy it and I’m terrible at it.

I mainly use JS to create utilities for myself that take me 1,000 hours to write and save me a lifetime of 45 seconds. I like Javascript for tawdry reasons: It’s straightforward, there’s a huge collection of libraries, any question I might ever have has already been asked and answered at StackOverflow, and I get to see the results immediately on screen. It’s of course also useful for the various bush league Web sites I occasionally have to put up (e.g., Everyday Chaos). Also, jQuery makes dumb HTML (DOM) work easy.

But here’s the but…

But, ECMA is taking JS in a terrible direction: it’s turning it professional, what with the arrow functions and the promises, etc. If you’re a hobbyist who enjoys programming for the logic of it, the new stuff in JS hides that logic on behalf of things I happen not to care about like elegance, consistency, and concision.

Now, I know that I don’t have to use the new stuff. But in fact I do, because the community I rely on to answer my questions — basically StackOverflow — increasingly is answering with the new stuff.

There’s a reason JS became the most used language on the planet: not only can you do webby stuff with it, it has a pretty linear learning curve. Now I literally feel like I’m in danger of losing “View Source” from my browser … literally because while I can view the source, increasingly I can’t understand it.

I’m going to lose this argument. I already have lost it. I should lose it. My position is wrong. I know that. Nevertheless, I stand firmly on the wrong side of history as I declare in my lonely, quavering voice: Keep. JavaScript. Dumb.


January 22, 2019

When your phone won’t walk a tree

If your Android phone no longer generates tones that work when you’re asked to “Press 1 to … do something” Pixel support says go to Settings > Apps > Phone. Touch the three dots in the upper right, select “Uninstall updates” and restart your phone. Worked for me.

My list of recent calls was preserved. Yay. But presumably Google is working on a more elegant solution.


December 21, 2018

“I know tech better than anyone” isn’t a lie

The Democrats are trying to belittle the concept of a Wall, calling it old fashioned. The fact is there is nothing else’s that will work, and that has been true for thousands of years. It’s like the wheel, there is nothing better. I know tech better than anyone, & technology…..

— Donald J. Trump (@realDonaldTrump) December 21, 2018

This comes from a man who does not know how to close an umbrella.

Does Trump really believe that he knows more about tech than anyone? Even if we take away the hyperbole, does he think he’s an expert at technology? What could he mean by that? That he knows how to build a computer? What an Internet router does? That he can explain what an adversarial neural network is, or just the difference between machine learning and deep learning? That he can provide IT support when Jared can’t find the song he just downloaded to his iPhone? That he can program his VCR?

But I don’t think he means any of those things by his ridiculous claim.

I think it’s worse than that. The phrase is clearly intended to have an effect, not to mean anything. “Listen to me. Believe me.” is an assertion of authority intended to forestall questioning. A genuine expert might say something like that, and at least sometimes it’d be reasonable and acceptable; it’s also sometimes obnoxious. Either way, “I know more about x than anyone” is a conversational tool.

So, Trump has picked up a hammer. His hand is clasped around its handle. He swings his arm and brings the hammer squarely down on the nail. He hears the bang. He has wielded this hammer successfully.

Except the rest of us can see there is nothing — nothing — in his hand. We all know that. Only he does not.

Trump is not lying. He is insane.


December 12, 2018

Posts from inside Google

For the past six months I’ve been a writer in residence embedded in a machine learning research group — PAIR (People + AI Research) — at the Google site in Cambridge, MA. I was recently renewed for another 6 months.

No, it’s not clear what a “writer in residence” does. So, I’ve been writing occasional posts that try to explain and contextualize some basic concepts in machine learning from the point of view of a humanities major who is deeply lacking the skills and knowledge of a computer scientist. Fortunately the developers at PAIR are very, very patient.

Here are three of the posts:

Machine Learning’s Triangle of Error: “…machine learning systems ‘think’ about fairness in terms of three interrelated factors: two ways the machine learning (ML) can go wrong, and the most basic way of adjusting the balance between these potential errors.”

Confidence Everywhere!: “… these systems are actually quite humble. It may seem counterintuitive, but we could learn from their humility.”

Hashtags and Confidence: “…in my fever dream of the future, we routinely say things like, “That celebrity relationship is going to last, 0.7 for sure!” …Expressions of confidence probably (0.8) won’t take exactly that form. But, then, a decade ago, many were dubious about the longevity of tagging…”

I also wrote about five types of fairness, which I posted about earlier: “…You appoint five respected ethicists, fairness activists, and customer advocates to figure out what gender mix of approved and denied applications would be fair. By the end of the first meeting, the five members have discovered that each of them has a different idea of what’s fair…”

I’ve also started writing an account of my attempt to write my very own machine learning program using TensorFlow.js: which lets you train a machine learning system in your browser; TensorFlow.js is a PAIR project. This project is bringing me face to face with the details of implementing even a “Hello, world”-ish ML program. (My project aims at suggesting tags for photos, based on a set of tagged images (Creative Commons-ed) from Flickr. It’s a toy, of course.)

I have bunch of other posts in the pipeline, as well as a couple of larger pieces on larger topics. Meanwhile, I’m trying to learn as much as I possibly can without becoming the most annoying person in Cambridge. But it might be too late to avoid that title…

1 Comment »

November 25, 2018

Using the API to check links

My new book (Everyday Chaos, HBR Press, May 2019) has a few hundred footnotes with links to online sources. Because Web sites change and links rot, I decided to link to‘s pages instead . is a product of the Harvard Library Innovation Lab, which I used to co-direct with Kim Dulin, but Perma is a Jonathan Zittrain project from after I left.

When you give a link to a page on the Web, it comes back with a link to a page on the site. That page has an archive copy of the original page exactly as it was when you supplied the link. It also makes a screen capture of that original page. And of course it includes a link to the original. It also promises to maintain the copy and screen capture in perpetuity — a promise backed by the Harvard Law Library and dozens of other libraries. So, when you give a reader a Perma link, they are taken to the page where they’ll always find the archived copy and the screen capture, no matter what happens to the original site. Also, the service is free for everyone, for real. Plus, the site doesn’t require users to supply any information about themselves. Also, there are no ads.

So that’s why my book’s references are to

But, over the course of the six years I spent writing this book, my references suffered some link rot on my side. Before I got around to creating the Perma links, I managed to make all the obvious errors and some not so obvious. As a result, now that I’m at the copyediting stage, I wanted to check all the Perma links.

I had already compiled a bibliography as a spreadsheet. (The book will point to the page for that spreadsheet.) So, I selected the Title and Perma Link columns, copied the content, and stuck it into a text document. Each line contains the page’s headline and then the Perma link. has an API that made it simple to write a script that looks up each Perma link and prints out the title it’s recorded next to the title of the page that I intend to be linked. If there’s a problem with Perma link, such as a double “https://https://” (a mistake I managed to introduce about a dozen times), or if the Perma link is private and not accessible to the public, it notes the problem. The human brain is good at scanning this sort of info, looking for inconsistencies.

Here’s the script. I used PHP because I happen to know it better than a less embarrassing choice such as Python and because I have no shame.





// This is a basic program for checking a list of page titles and links


// It’s done badly because I am a terrible hobbyist programmer.


// I offer it under whatever open source license is most permissive. I’m really not


// going to care about anything you do with it. Except please note I’m a


// terrible hobbyist programmer who makes no claims about how well this works.




// David Weinberger


// [email protected]


// Nov. 23, 2018



// API documentation is here:



// This program assumes there’s a file with the page title and one perma link per line.


// E.g. The Rand Corporation: The Think Tank That Controls America



// Read that text file into an array


$lines = file(‘links-and-titles.txt’);




for ($i = 0; $i < count($lines); $i++){


$line = $lines[$i];


// divide into title and permalink


$p1 = strpos($line, “https”); // find the beginning of the perma link


$fullperma = substr($line, $p1); // get the full perma link


$origtitle = substr($line, 0,$p1); // get the title


$origtitle = rtrim($origtitle); // trim the spaces from the end of the title



// get the distinctive part of the perma link: the stuff after


$permacode = strrchr($fullperma,”/”); // find the last forward slash


$permacode = substr($permacode,1,strlen($permacode)); // get what’s after that slash


$permacode = rtrim($permacode); // trim any spaces from the end



// create the url that will fetch this perma link


$apiurl = “” . $permacode . “/”;



// fetch the data about this perma link


$onelink = file_get_contents($apiurl);


// echo $onelink; // this would print the full json


// decode the json


$j = json_decode($onelink, true);


// Did you get any json, or just null?


if ($j == null){


// hmm. This might be a private perma link. Or some other error


echo “<p>– $permacode failed. Private? $permaccode</p>”;




// otherwise, you got something, so write some of the data into the page


else {


echo “<b>” . $j[“guid”] . ‘</b><blockquote>’ . $j[“title”] . ‘<br>’ . $origtitle . “<br>” . $j[“url”] . “</blockquote>”;








// finish by noting how many files have been read


echo “<h2>Read ” . count($lines) . “</h2>”;




Run this script in a browser and it will create a page with the results. (The script is available at GitHub.)


By the way, and mainly because I keep losing track of this info, the table of code was created by a little service cleverly called Convert JS to Table.

1 Comment »

October 12, 2018

How browsers learned to support your system's favorite font

Operating systems play favorites when it comes to fonts: they pick one as their default. And the OS’s don’t agree with one another:

But now when you’re designing a page you can tell CSS to use the system font of whatever operating system the browser is running on. This is thanks to Craig Hockenberry who proposed the idea in an article three years ago. Apple picked up on it, and now it’s worked it’s way into the standard CSS font module and is supported by Chrome and Safari; Windows and Mozilla are lagging. Here’s Craig’s write-up of the process.

Here’s a quick test of whether it’s working in the browser you’re reading this post with:

This sentence should be in this blog’s designated font: Georgia. Or maybe one of its serif-y fall-backs.

This one should be in your operating system’s standard font, at least if you’re using Chrome or Safari at the moment.

We now return you to this blog’s regular font, already in progress.

Comments Off on How browsers learned to support your system's favorite font

Next Page »