Joho the BlogJoho the Blog - Let's just see what happens

April 16, 2019

First chapter of Everyday Chaos on Medium…and more!

Well, actually less. And more. Allow me to explain:

The first half of the first chapter of Everyday Chaos is now available at Medium. (An Editor’s Choice, no less!)

You can also read the first half of the chapter on how our model of models is changing at the Everyday Chaos site (Direct link: pdf).

At that site you’ll also find a fifteen minute video (Direct link: video) in which I attempt to explain why I wrote the book and what it’s about.

Or, you can just skip right to the pre-order button (Direct link: Amazon or IndieBound) :)

Be the first to comment »

April 15, 2019

Fountain Pens: The Tool, the Minor Fetish

In response to a tweet asking writers what they write out longhand, I replied that if I’m particularly at sea, I’ll write out an outline, usually with lots of looping arrows, on a pad. But only with a fountain pen. Ballpoints don’t work.

My old bloggy friend AKMA wondered how he’d known me so long without knowing that I’m a fountain pen guy. The truth is that I’ve only recently become one. I’ve liked them at various times over the course my life, but only about four years ago did I integrate fountain pens into my personality.

It happened because I bought a $20 Lamy Safari on impulse in a stationery store. From there I got some single-digit Chinese fountain pens. Then, when I made some money on a writing contract, I treated myself to a $120 Lamy 2000, a lifetime pen. It’s pretty much perfect, from the classic 1960s design to the way the ink flows onto paper just wet enough and with enough scratchiness to feel like you’re on a small creek splashing over stones as it carves out words.

I have recently purchased a TWSBI ECO for $30. It has replaced my Safari as my daily pen. It’s lovely to write with, holds a lot of ink, and feels slightly sturdier than the Safari. Recommended.

Even though my handwriting is horrendous, I look forward to opportunities to write with these pens. But I avoid writing anything I’ll then have to transcribe because transcribing is so tedious. I do harbor a romantic notion of writing fiction longhand with a fountain pen on pads of Ampad “golden fibre.” Given that my fiction is worse than my handwriting, we can only hope that this notion itself remains a fiction.

So much of my writing is undoing, Penelope-like, the words I wove the day before that I am not tempted even a little to switch from word processors when the words and their order are the object. But when the words are mere vehicles, my thinking is helped — I believe — by a pen that drags its feet in the dirt.

3 Comments »

March 24, 2019

Automating our hardest things: Machine Learning writes

In 1948 when Claude Shannon was inventing information science [pdf] (and, I’d say, information itself), he took as an explanatory example a simple algorithm for predicting the element of a sentence. For example, treating each letter as equiprobable, he came up with sentences such as:

XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGHYD QPAAMKBZAACIBZLHJQD.

If you instead use the average frequency of each letter, you instead come up with sentences that seem more language-like:

OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBTTVA NAH BRL.

At least that one has a reasonable number of vowels.

If you then consider the frequency of letters following other letters—U follows a Q far more frequently than X does—you are practically writing nonsense Latin:

ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE.

Looking not at pairs of letters but triplets Shannon got:

IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE.

Then Shannon changes his units from triplets of letters to triplets of words, and gets:

THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.

Pretty good! But still gibberish.

Now jump ahead seventy years and try to figure out which pieces of the following story were written by humans and which were generated by a computer:

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science.

Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.

Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow.

Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez.

“Pérez and his friends were astonished to see the unicorn herd”Pérez and his friends were astonished to see the unicorn herd. These creatures could be seen from the air without having to move too much to see them – they were so close they could touch their horns.

While examining these bizarre creatures the scientists discovered that the creatures also spoke some fairly regular English. Pérez stated, “We can see, for example, that they have a common ‘language,’ something like a dialect or dialectic.”

Dr. Pérez believes that the unicorns may have originated in Argentina, where the animals were believed to be descendants of a lost race of people who lived there before the arrival of humans in those parts of South America.

While their origins are still unclear, some believe that perhaps the creatures were created when a human and a unicorn met each other in a time before human civilization. According to Pérez, “In South America, such incidents seem to be quite common.”

However, Pérez also pointed out that it is likely that the only way of knowing for sure if unicorns are indeed the descendants of a lost alien race is through DNA. “But they seem to be able to communicate in English quite well, which I believe is a sign of evolution, or at least a change in social organization,” said the scientist.

The answer: The first paragraph was written by a human being. The rest was generated by a machine learning system trained on a huge body of text. You can read about it in a fascinating article (pdf of the research paper) by its creators at OpenAI. (Those creators are: Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever.)

There are two key differences between this approach and Shannon’s.

First, the new approach analyzed a very large body of documents from the Web. It ingested 45 million pages linked in Reddit comments that got more than three upvotes. After removing duplicates and some other cleanup, the data set was reduced to 8 million Web pages. That is a lot of pages. Of course the use of Reddit, or any one site, can bias the dataset. But one of the aims was to compare this new, huge, dataset to the results from existing sets of text-based data. For that reason, the developers also removed Wikipedia pages from the mix since so many existing datasets rely on those pages, which would smudge the comparisons.

(By the way, a quick google search for any page from before December 2018 mentioning both “Jorge Pérez” and “University of La Paz” turned up nothing. “The AI is constructing, not copy-pasting.”The AI is constructing, not copy-pasting.)

The second distinction from Shannon’s method: the developers used machine learning (ML) to create a neural network, rather than relying on a table of frequencies of words in triplet sequences. ML creates a far, far more complex model that can assess the probability of the next word based on the entire context of its prior uses.

The results can be astounding. While the developers freely acknowledge that the examples they feature are somewhat cherry-picked, they say:

When prompted with topics that are highly represented in the data (Brexit, Miley Cyrus, Lord of the Rings, and so on), it seems to be capable of generating reasonable samples about 50% of the time. The opposite is also true: on highly technical or esoteric types of content, the model can perform poorly.

There are obviously things to worry about as this technology advances. For example, fake news could become the Earth’s most abundant resource. For fear of its abuse, its developers are not releasing the full dataset or model weights. Good!

Nevertheless, the possibilities for research are amazing. And, perhaps most important in the longterm, one by one the human capabilities that we take as unique and distinctive are being shown to be replicable without an engine powered by a miracle.

That may be a false conclusion. Human speech does not consist simply of the utterances we make but the complex intentional and social systems in which those utterances are more than just flavored wind. But ML intends nothing and appreciates nothing. “Nothing matters to ML.”Nothing matters to ML. Nevertheless, knowing that sufficient silicon can duplicate the human miracle should shake our confidence in our species’ special place in the order of things.

(FWIW, my personal theology says that when human specialness is taken as conferring special privilege, any blow to it is a good thing. When that specialness is taken as placing special obligations on us, then at its very worst it’s a helpful illusion.)

5 Comments »

March 8, 2019

Keep JavaScript dumb

I’ve been a hobbyist programmer since I got my first computer in 1984 or so. I greatly enjoy it and I’m terrible at it.

I mainly use JS to create utilities for myself that take me 1,000 hours to write and save me a lifetime of 45 seconds. I like Javascript for tawdry reasons: It’s straightforward, there’s a huge collection of libraries, any question I might ever have has already been asked and answered at StackOverflow, and I get to see the results immediately on screen. It’s of course also useful for the various bush league Web sites I occasionally have to put up (e.g., Everyday Chaos). Also, jQuery makes dumb HTML (DOM) work easy.

But here’s the but…

But, ECMA is taking JS in a terrible direction: it’s turning it professional, what with the arrow functions and the promises, etc. If you’re a hobbyist who enjoys programming for the logic of it, the new stuff in JS hides that logic on behalf of things I happen not to care about like elegance, consistency, and concision.

Now, I know that I don’t have to use the new stuff. But in fact I do, because the community I rely on to answer my questions — basically StackOverflow — increasingly is answering with the new stuff.

There’s a reason JS became the most used language on the planet: not only can you do webby stuff with it, it has a pretty linear learning curve. Now I literally feel like I’m in danger of losing “View Source” from my browser … literally because while I can view the source, increasingly I can’t understand it.

I’m going to lose this argument. I already have lost it. I should lose it. My position is wrong. I know that. Nevertheless, I stand firmly on the wrong side of history as I declare in my lonely, quavering voice: Keep. JavaScript. Dumb.

9 Comments »

January 22, 2019

When your phone won’t walk a tree

If your Android phone no longer generates tones that work when you’re asked to “Press 1 to … do something” Pixel support says go to Settings > Apps > Phone. Touch the three dots in the upper right, select “Uninstall updates” and restart your phone. Worked for me.

My list of recent calls was preserved. Yay. But presumably Google is working on a more elegant solution.

2 Comments »

December 21, 2018

“I know tech better than anyone” isn’t a lie

The Democrats are trying to belittle the concept of a Wall, calling it old fashioned. The fact is there is nothing else’s that will work, and that has been true for thousands of years. It’s like the wheel, there is nothing better. I know tech better than anyone, & technology…..

— Donald J. Trump (@realDonaldTrump) December 21, 2018

This comes from a man who does not know how to close an umbrella.

Does Trump really believe that he knows more about tech than anyone? Even if we take away the hyperbole, does he think he’s an expert at technology? What could he mean by that? That he knows how to build a computer? What an Internet router does? That he can explain what an adversarial neural network is, or just the difference between machine learning and deep learning? That he can provide IT support when Jared can’t find the song he just downloaded to his iPhone? That he can program his VCR?

But I don’t think he means any of those things by his ridiculous claim.

I think it’s worse than that. The phrase is clearly intended to have an effect, not to mean anything. “Listen to me. Believe me.” is an assertion of authority intended to forestall questioning. A genuine expert might say something like that, and at least sometimes it’d be reasonable and acceptable; it’s also sometimes obnoxious. Either way, “I know more about x than anyone” is a conversational tool.

So, Trump has picked up a hammer. His hand is clasped around its handle. He swings his arm and brings the hammer squarely down on the nail. He hears the bang. He has wielded this hammer successfully.

Except the rest of us can see there is nothing — nothing — in his hand. We all know that. Only he does not.

Trump is not lying. He is insane.

8 Comments »

December 12, 2018

Posts from inside Google

For the past six months I’ve been a writer in residence embedded in a machine learning research group — PAIR (People + AI Research) — at the Google site in Cambridge, MA. I was recently renewed for another 6 months.

No, it’s not clear what a “writer in residence” does. So, I’ve been writing occasional posts that try to explain and contextualize some basic concepts in machine learning from the point of view of a humanities major who is deeply lacking the skills and knowledge of a computer scientist. Fortunately the developers at PAIR are very, very patient.

Here are three of the posts:

Machine Learning’s Triangle of Error: “…machine learning systems ‘think’ about fairness in terms of three interrelated factors: two ways the machine learning (ML) can go wrong, and the most basic way of adjusting the balance between these potential errors.”

Confidence Everywhere!: “… these systems are actually quite humble. It may seem counterintuitive, but we could learn from their humility.”

Hashtags and Confidence: “…in my fever dream of the future, we routinely say things like, “That celebrity relationship is going to last, 0.7 for sure!” …Expressions of confidence probably (0.8) won’t take exactly that form. But, then, a decade ago, many were dubious about the longevity of tagging…”

I also wrote about five types of fairness, which I posted about earlier: “…You appoint five respected ethicists, fairness activists, and customer advocates to figure out what gender mix of approved and denied applications would be fair. By the end of the first meeting, the five members have discovered that each of them has a different idea of what’s fair…”

I’ve also started writing an account of my attempt to write my very own machine learning program using TensorFlow.js: which lets you train a machine learning system in your browser; TensorFlow.js is a PAIR project. This project is bringing me face to face with the details of implementing even a “Hello, world”-ish ML program. (My project aims at suggesting tags for photos, based on a set of tagged images (Creative Commons-ed) from Flickr. It’s a toy, of course.)

I have bunch of other posts in the pipeline, as well as a couple of larger pieces on larger topics. Meanwhile, I’m trying to learn as much as I possibly can without becoming the most annoying person in Cambridge. But it might be too late to avoid that title…

1 Comment »

November 25, 2018

Using the Perma.cc API to check links

My new book (Everyday Chaos, HBR Press, May 2019) has a few hundred footnotes with links to online sources. Because Web sites change and links rot, I decided to link to Perma.cc‘s pages instead . Perma.cc is a product of the Harvard Library Innovation Lab, which I used to co-direct with Kim Dulin, but Perma is a Jonathan Zittrain project from after I left.

When you give Perma.cc a link to a page on the Web, it comes back with a link to a page on the Perma.cc site. That page has an archive copy of the original page exactly as it was when you supplied the link. It also makes a screen capture of that original page. And of course it includes a link to the original. It also promises to maintain the Perma.cc copy and screen capture in perpetuity — a promise backed by the Harvard Law Library and dozens of other libraries. So, when you give a reader a Perma link, they are taken to the Perma.cc page where they’ll always find the archived copy and the screen capture, no matter what happens to the original site. Also, the service is free for everyone, for real. Plus, the site doesn’t require users to supply any information about themselves. Also, there are no ads.

So that’s why my book’s references are to Perma.cc.

But, over the course of the six years I spent writing this book, my references suffered some link rot on my side. Before I got around to creating the Perma links, I managed to make all the obvious errors and some not so obvious. As a result, now that I’m at the copyediting stage, I wanted to check all the Perma links.

I had already compiled a bibliography as a spreadsheet. (The book will point to the Perma.cc page for that spreadsheet.) So, I selected the Title and Perma Link columns, copied the content, and stuck it into a text document. Each line contains the page’s headline and then the Perma link.

Perma.cc has an API that made it simple to write a script that looks up each Perma link and prints out the title it’s recorded next to the title of the page that I intend to be linked. If there’s a problem with Perma link, such as a double “https://https://” (a mistake I managed to introduce about a dozen times), or if the Perma link is private and not accessible to the public, it notes the problem. The human brain is good at scanning this sort of info, looking for inconsistencies.

Here’s the script. I used PHP because I happen to know it better than a less embarrassing choice such as Python and because I have no shame.

1

<?php

 

2

// This is a basic program for checking a list of page titles and perma.cc links

3

// It’s done badly because I am a terrible hobbyist programmer.

4

// I offer it under whatever open source license is most permissive. I’m really not

5

// going to care about anything you do with it. Except please note I’m a

6

// terrible hobbyist programmer who makes no claims about how well this works.

7

//

8

// David Weinberger

9

// [email protected]

10

// Nov. 23, 2018

 

11

// Perma.cc API documentation is here: https://perma.cc/docs/developer

 

12

// This program assumes there’s a file with the page title and one perma link per line.

13

// E.g. The Rand Corporation: The Think Tank That Controls America https://perma.cc/B5LR-88CF

 

14

// Read that text file into an array

15

$lines = file(‘links-and-titles.txt’);

 

 

16

for ($i = 0; $i < count($lines); $i++){

17

$line = $lines[$i];

18

// divide into title and permalink

19

$p1 = strpos($line, “https”); // find the beginning of the perma link

20

$fullperma = substr($line, $p1); // get the full perma link

21

$origtitle = substr($line, 0,$p1); // get the title

22

$origtitle = rtrim($origtitle); // trim the spaces from the end of the title

 

23

// get the distinctive part of the perma link: the stuff after https://perma.cc/

24

$permacode = strrchr($fullperma,”/”); // find the last forward slash

25

$permacode = substr($permacode,1,strlen($permacode)); // get what’s after that slash

26

$permacode = rtrim($permacode); // trim any spaces from the end

 

27

// create the url that will fetch this perma link

28

$apiurl = “https://api.perma.cc/v1/public/archives/” . $permacode . “/”;

 

29

// fetch the data about this perma link

30

$onelink = file_get_contents($apiurl);

31

// echo $onelink; // this would print the full json

32

// decode the json

33

$j = json_decode($onelink, true);

34

// Did you get any json, or just null?

35

if ($j == null){

36

// hmm. This might be a private perma link. Or some other error

37

echo “<p>– $permacode failed. Private? $permaccode</p>”;

38

}

39

// otherwise, you got something, so write some of the data into the page

40

else {

41

echo “<b>” . $j[“guid”] . ‘</b><blockquote>’ . $j[“title”] . ‘<br>’ . $origtitle . “<br>” . $j[“url”] . “</blockquote>”;

42

}

43

}

 

 

44

// finish by noting how many files have been read

45

echo “<h2>Read ” . count($lines) . “</h2>”;

 

46

?>

Run this script in a browser and it will create a page with the results. (The script is available at GitHub.)

Thanks, Perma.cc!



By the way, and mainly because I keep losing track of this info, the table of code was created by a little service cleverly called Convert JS to Table.

1 Comment »

October 12, 2018

How browsers learned to support your system's favorite font

Operating systems play favorites when it comes to fonts: they pick one as their default. And the OS’s don’t agree with one another:

But now when you’re designing a page you can tell CSS to use the system font of whatever operating system the browser is running on. This is thanks to Craig Hockenberry who proposed the idea in an article three years ago. Apple picked up on it, and now it’s worked it’s way into the standard CSS font module and is supported by Chrome and Safari; Windows and Mozilla are lagging. Here’s Craig’s write-up of the process.

Here’s a quick test of whether it’s working in the browser you’re reading this post with:

This sentence should be in this blog’s designated font: Georgia. Or maybe one of its serif-y fall-backs.

This one should be in your operating system’s standard font, at least if you’re using Chrome or Safari at the moment.

We now return you to this blog’s regular font, already in progress.

Comments Off on How browsers learned to support your system's favorite font

September 20, 2018

Coming to belief

I’ve written before about the need to teach The Kids (also: all of us) not only how to think critically so we can see what we should not believe, but also how to come to belief. That piece, which I now cannot locate, was prompted by danah boyd’s excellent post on the problem with media literacy. Robert Berkman, Outreach, Business Librarian at the University of Rochester and Editor of The Information Advisor’s Guide to Internet Research, asked me how one can go about teaching people how to come to belief. Here’s an edited version of my reply:

I’m afraid I don’t have a good answer. I actually haven’t thought much about how to teach people how to come to belief, beyond arguing for doing this as a social process (the ol’ “knowledge is a network” argument :) I have a pretty good sense of how *not* to do it: the way philosophy teachers relentlessly show how every proposed position can be torn down.

I wonder what we’d learn by taking a literature course as a model — not one that is concerned primarily with critical method, but one that is trying to teach students how to appreciate literature. Or art. The teacher tries to get the students to engage with one another to find what’s worthwhile in a work. Formally, you implicitly teach the value of consistency, elegance of explanation, internal coherence, how well a work clarifies one’s own experience, etc. Those are useful touchstones for coming to belief.

I wouldn’t want to leave students feeling that it’s up to them to come up with an understanding on their own. I’d want them to value the history of interpretation, bringing their critical skills to it. The last thing we need is to make people feel yet more unmoored.

I’m also fond of the orthodox Jewish way of coming to belief, as I, as a non-observant Jew, understand it. You have an unchanging and inerrant text that means nothing until humans interpret it. To interpret it means to be conversant with the scholarly opinions of the great Rabbis, who disagree with one another, often diametrically. Formulating a belief in this context means bringing contemporary intelligence to a question while finding support in the old Rabbis…and always always talking respectfully about those other old Rabbis who disagree with your interpretation. No interpretations are final. Learned contradiction is embraced.

That process has the elements I personally like (being moored to a tradition, respecting those with whom one disagrees, acceptance of the finitude of beliefs, acceptance that they result from a social process), but it’s not going to be very practical outside of Jewish communities if only because it rests on the acceptance of a sacred document, even though it’s one that literally cannot be taken literally; it always requires interpretation.

My point: We do have traditions that aim at enabling us to come to belief. Science is one of them. But there are others. We should learn from them.

TL;DR: I dunno.

2 Comments »

Next Page »