December 12, 2018

Posts from inside Google

For the past six months I’ve been a writer in residence embedded in a machine learning research group — PAIR (People + AI Research) — at the Google site in Cambridge, MA. I was recently renewed for another 6 months.

No, it’s not clear what a “writer in residence” does. So, I’ve been writing occasional posts that try to explain and contextualize some basic concepts in machine learning from the point of view of a humanities major who is deeply lacking the skills and knowledge of a computer scientist. Fortunately the developers at PAIR are very, very patient.

Here are three of the posts:

Machine Learning’s Triangle of Error: “…machine learning systems ‘think’ about fairness in terms of three interrelated factors: two ways the machine learning (ML) can go wrong, and the most basic way of adjusting the balance between these potential errors.”

Confidence Everywhere!: “… these systems are actually quite humble. It may seem counterintuitive, but we could learn from their humility.”

Hashtags and Confidence: “…in my fever dream of the future, we routinely say things like, “That celebrity relationship is going to last, 0.7 for sure!” …Expressions of confidence probably (0.8) won’t take exactly that form. But, then, a decade ago, many were dubious about the longevity of tagging…”

I also wrote about five types of fairness, which I posted about earlier: “…You appoint five respected ethicists, fairness activists, and customer advocates to figure out what gender mix of approved and denied applications would be fair. By the end of the first meeting, the five members have discovered that each of them has a different idea of what’s fair…”

I’ve also started writing an account of my attempt to write my very own machine learning program using TensorFlow.js: which lets you train a machine learning system in your browser; TensorFlow.js is a PAIR project. This project is bringing me face to face with the details of implementing even a “Hello, world”-ish ML program. (My project aims at suggesting tags for photos, based on a set of tagged images (Creative Commons-ed) from Flickr. It’s a toy, of course.)

I have bunch of other posts in the pipeline, as well as a couple of larger pieces on larger topics. Meanwhile, I’m trying to learn as much as I possibly can without becoming the most annoying person in Cambridge. But it might be too late to avoid that title…

November 22, 2015

Bing can’t find Windows 10 Ten Cents sale…but Google can

I heard that Microsoft has some excellent $0.10 deals for Windows 10 owners like me. So I checked Bing:

bing listing

The top hit (an ad by Microsoft) takes you to a page for corporate sales of Windows phones.

The second hit (an ad by Microsoft) takes you to the generic Microsoft Store front page from which it is virtually impossible to find the $0.10 sales.

None of the rest of the results on the first page of the Bing search gets you anywhere close.

Same search at Google:

google listing

The top hit (a Microsoft ad) takes you to the same generic front page of the Microsoft Store as the second hit on Bing, which makes no mention of the $0.10 sales.

The following Google results take you to pages about the $0.10 sales from which you can actually get to the goddamn sale.

Yes, these sales are real. For example, this is from the site this afternoon:

google listing

I got there by going to the post listed in the Google results….although right now the Windows site is telling me that something is wrong and I should come back later.

PS: To get to the Hitman Go sale, my best advice is to go to the Windows Store on your Windows 10 machine. The $0.10 sales are featured there. Or search there for Hitman Go.

November 19, 2015

Google stepping forward to defend Fair Use

Google has just posted that it’s going to start defending some YouTube videos from DMCA takedown notices when it believes that those videos are protected by the Fair Use exemption from copyright law.

This is great news and long overdue.

The Digital Millennium Copyright Act of 1998 lets a copyright holder send a notice to a site like YouTube claiming that a video violates its copyright. YouTube passes that notice on to the video poster and takes down the video. The poster can enter into a legal battle with the copyright holder which is rarely worth the time and money even if the poster is totally within her rights.

As a result, Big Content sends YouTube thousands of takedown notices that are generated algorithmically, without a human ever looking at the video to see if it is actually a violation. Since there’s no practical penalty for sending in a groundless takedown notice, Big Content has a “When in doubt, take it out” attitude.

But you usually can’t tell if a video falls under the Fair Use exemption without looking at it. Fair Use exempts material from claims of copyright infringement if the material is satire, if it’s citing the original in a review, for some educational purposes, etc. Fair Use is just plain common sense. Without it, you’d have to get Donald Trump’s permission to mem-ify one of his quotes.

Google to its credit recently used Fair Use to defend Google Books‘ scanning and indexing of in-copyright works. It won. This was a big victory for Fair Use.

Now Google seems ready to step forward and champion Fair Use in other realms. It’s hard to see how this benefits Google directly — they’ll be spending legal fees to keep some person’s video up, even as 400 hours of video is uploaded to YouTube every minute. But creating a Fair Use speed bump in the automatic and robotic cleansing of the Net is great for the ecosystem, which is great for us and ultimately for companies like Google that rely on the Internet remaining a robust domain of discourse and creativity.

October 16, 2015

A victory for fair use

The Second Circuit Court of Appeals today upheld the decision that permits Google Books to scan and index books to make them searchable and for data mining. The court agreed that this is fair use. It also generalized the prior court’s finding so now libraries can also scan their own collection, so long as they provide access as limited as Google Books does. Woohoo!

Here’s the surprisingly readable decision [pdf].

The Authors Guild has now vowed it’s going to appeal to the Supreme Court. But I don’t get it.

Not that this necessarily matters to the legal case, but has the Authors Guild been able to attribute any actual damage to Google Books? Their site today says:

America owes its thriving literary culture to copyright protection. It is because of that success that today we take copyright incentives for granted, and that courts as respected as the Second Circuit are unable to see the damaging effect that uses such as Google’s will have on authors’ potential income.

If Google Books hasn’t produced any visible damage so far, shouldn’t that count as evidence that “uses such as Google’s” are unlikely to damage the interests of AG’s constituency?

In a longer piece on its site, the AG says:

Google Books will indeed harm the market for books,


Further, if Google’s doing so is fair use, then it sets a precedent allowing anyone to digitize books for similar purposes, which inevitably will lead to widespread, free, and unrestricted availability of books online.

But at this point, eleven years after the beginning of the suit, shouldn’t they be able to demonstrate some of that inevitable harm? Did the prior ruling lead to any increase in the unrestricted availability of free books online?

Haven’t we tested The Authors Guild’s hypothesis?

September 13, 2015

My worst caption so far

Here’s this week’s New Yorker caption contest cartoon:

Cars piled up

My hilarious caption? And I’m only telling you this because obviously there’s no change it’s going to be one of the chosen three:

Hey, could someone tell Google Highways that the buffer is full?


December 27, 2014

Oculus Thrift

I just received Google’s Oculus Rift emulator. Given that it’s made of cardboard, it’s all kinds of awesome.

Google Cardboard is a poke in Facebook’s eyes. FB bought Oculus Rift, the virtual reality headset, for $2B. Oculus hasn’t yet shipped a product, but its prototypes are mind-melting. My wife and I tried one last year at an Israeli educational tech lab, and we literally had to have people’s hands on our shoulders so we wouldn’t get so disoriented that we’d swoon. The Lab had us on a virtual roller coaster, with the ability to turn our heads to look around. It didn’t matter that it was an early, low-resolution prototype. Swoon.

Oculus is rumored to be priced at around $350 when it ships, and they will sell tons at that price. Basically, anyone who tries one will be a customer or will wish s/he had the money to be a customer. Will it be confined to game players? Not a chance on earth.

So, in the midst of all this justifiable hype about the Oculus Rift, Google announced Cardboard: detailed plans for how to cut out and assemble a holder for your mobile phone that positions it in front of your eyes. The Cardboard software divides the screen in two and creates a parallaxed view so you think you’re seeing in 3D. It uses your mobile phone’s kinetic senses to track the movement of your head as you purview your synthetic domain.

I took a look at the plans for building the holder and gave up. For $15 I instead ordered one from Unofficial Cardboard.

When it arrived this morning, I took it out of its shipping container (made out of cardboard, of course), slipped in my HTC mobile phone, clicked on the Google Cardboard software, chose a demo, and was literally — in the virtual sense — flying over the earth in any direction I looked, watching a cartoon set in a forest that I was in, or choosing YouTube music videos by turning to look at them on a circular wall.

Obviously I’m sold on the concept. But I’m also sold on the pure cheekiness of Google’s replicating the core functionality of the Oculus Rift by using existing technology, including one made of cardboard.

(And, yeah, I’m a little proud of the headline.)


June 8, 2014

Will a Google car sacrifice you for the sake of the many? (And Networked Road Neutrality)

Google self-driving cars are presumably programmed to protect their passengers. So, when a traffic situation gets nasty, the car you’re in will take all the defensive actions it can to keep you safe.

But what will robot cars be programmed to do when there’s lots of them on the roads, and they’re networked with one another?

We know what we as individuals would like. My car should take as its Prime Directive: “Prevent my passengers from coming to harm.” But when the cars are networked, their Prime Directive well might be: “Minimize the amount of harm to humans overall.” And such a directive can lead a particular car to sacrifice its humans in order to keep the total carnage down. Asimov’s Three Rules of Robotics don’t provide enough guidance when the robots are in constant and instantaneous contact and have fragile human beings inside of them.

It’s easy to imagine cases. For example, a human unexpectedly darts into a busy street. The self-driving cars around it rapidly communicate and algorithmically devise a plan that saves the pedestrian at the price of causing two cars to engage in a Force 1 fender-bender and three cars to endure Force 2 minor collisions…but only if the car I happen to be in intentionally drives itself into a concrete piling, with a 95% chance of killing me. All other plans result in worse outcomes, where “worse” refers to some scale that weighs monetary damages, human injuries, and human deaths.

Or, a broken run-off pipe creates a dangerous pool of water on the highway during a flash storm. The self-driving cars agree that unless my car accelerates and rams into a concrete piling, all other joint action results in a tractor trailing jack-knifing, causing lots of death and destruction. Not to mention The Angelic Children’s Choir school bus that would be in harm’s way. So, the swarm of robotic cars makes the right decision and intentionally kills me.

In short, the networking of robotic cars will change the basic moral principles that guide their behavior. Non-networked cars are presumably programmed to be morally-blind individualists trying to save their passengers without thinking about others, but networked cars will probably be programmed to support some form of utilitarianism that tries to minimize the collective damage. And that’s probably what we’d want. Isn’t it?

But one of the problems with utilitarianism is that there turns out to be little agreement about what counts as a value and how much it counts. Is saving a pedestrian more important than saving a passenger? Is it always right try to preserve human life, no matter how unlikely it is that the action will succeed and no matter how many other injuries it is likely to result in? Should the car act as if its passenger has seat-belted him/herself in because passengers should do so? Should the cars be more willing to sacrifice the geriatric than the young, on the grounds that the young have more of a lifespan to lose? And won’t someone please think about the kids m— those cute choir kids?

We’re not good at making these decisions, or even at having rational conversations about them. Usually we don’t have to, or so we tell ourselves. For example, many of the rules that apply to us in public spaces, including roads, optimize for fairness: everyone waits at the same stop lights, and you don’t get to speed unless something is relevantly different about your trip: you are chasing a bad guy or are driving someone who urgently needs medical care.

But when we are better able control the circumstances, fairness isn’t always the best rule, especially in times of distress. Unfortunately, we don’t have a lot of consensus around the values that would enable us to make joint decisions. We fall back to fairness, or pretend that we can have it all. Or we leave it to experts, as with the rules that determine who gets organ transplants. It turns out we don’t even agree about whether it’s morally right to risk soldiers’ lives to rescue a captured comrade.

Fortunately, we don’t have to make these hard moral decisions. The people programming our robot cars will do it for us.


Imagine a time when the roadways are full of self-driving cars and trucks. There are some good reasons to think that that time is coming, and coming way sooner than we’d imagined.

Imagine that Google remains in the lead, and the bulk of the cars carry their brand. And assume that these cars are in networked communication with one another.

Can we assume that Google will support Networked Road Neutrality, so that all cars are subject to the same rules, and there is no discrimination based on contents, origin, destination, or purpose of the trip?

Or would Google let you pay a premium to take the “fast lane”? (For reasons of network optimization the fast lane probably wouldn’t actually be a designated lane but well might look much more like how frequencies are dynamically assigned in an age of “smart radios.”) We presumably would be ok with letting emergency vehicles go faster than the rest of the swarm, but how about letting the rich go faster by programming the robot cars to give way when a car with its “Move aside!” bit is on?

Let’s say Google supports a strict version of Networked Road Neutrality. But let’s assume that Google won’t be the only player in this field. Suppose Comcast starts to make cars, and programs them to get ahead of the cars that choose to play by the rules. Would Google cars take action to block the Comcast cars from switching lanes to gain a speed advantage — perhaps forming a cordon around them? Would that be legal? Would selling a virtual fast lane on a public roadway be legal in the first place? And who gets to decide? The FCC?

One thing is sure: It’ll be a golden age for lobbyists.


May 13, 2014

Full-text searching Harvard Library: a hacky mashup

Harvard Library has 13M items in its collection. Harvard is digitizing many of them, but as of now you cannot do a full text search of them.

Google Books had 30M books digitized as of a year ago. You can do full-text searches of them.

So, I wrote a little app [Note: I’ve corrected this url.] that lets you search Google Books for text, and then matches up the results with books in Harvard Library. It’s a proof of concept, and I’m counting the concept as proved, or at least as promising. On the other hand, my API key for Google Books only allows 2,000 queries a day, so it’s not practical on the licensing front.

This project runs on top of LibraryCloud, an open source library metadata server created by the Harvard Library Innovation Lab that I co-direct (until Sept.). LibraryCloud provides an API to Harvard’s open library metadata and more. (We’re building a new, more scalable version now. It is, well, super-cool.)

But please note that this HOLLIS full-text search thingy is NOT a project done by our highly innovative and highly skilled developers. I did it, which means if you look at the code (github) you will have a good laugh. Also, this service will fail in dull and interesting ways. I am a horrible programmer. (But I enjoy it.)

Some details below the clickable screenshot…

Click on the image to expand it.
googleHollis screen capture

Click here to go to the app.

The Google Books results are on the left (only ten for now), and HOLLIS on the right.

If a Google result is yellow, there’s a match with a book in HOLLIS. Gray means no match. HOLLIS book titles are prefaced by a number that refers to the Google results number. Clicking on the Google results number (in the circle) hides or shows those works in the stack on the right; this is because some Google books match lots of items in HOLLIS. (Harvard has a lot of copies of King Lear, for example.)

There are two types of matches. If an item matched on a firm identifier (ISBN,OCLC, LCCN), then there’s a checkmark before the title in the HOLLIS stack, and there’s a “Stacklife” button in the Google list. Clicking on the Stacklife button displays the book in Harvard StackLife, a very cool — and prize winning! — library browser created by our Lab. The StackLife stack colorizes items based on how much they’re used by the Harvard community. The thickness of the book indicates its page count and its length indicates its actual physical height.

If there’s no match on the identifiers, then the page looks for a keyword match on the title and an exact match on the author’s last name. This can result in multiple results, not all of which may be right. So, on the Google result there’s a “Feeling lucky” button that will take you to the first match’s entry in StackLife.

The “Google” button takes you to that item’s page at Google Books, filtered by your search terms for your full-texting convenience.

The “View” button pops up the Google Books viewer for that book, if it’s available.

The “Clear stack” button deselects all the items in the Google results, hiding all the items in the HOLLIS stack.

Let me know how this breaks or sucks, but don’t expect it ever to be a robust piece of software. Remember its source.

November 7, 2013

What I want those Google barges to be

We now know that the Google barges are “interactive learning spaces.” That narrows the field. They’re not off-shore data centers or Google Glass stores. They’re also not where Google keeps the porn (as Seth Meyers reported) and they’re not filled with bubblewrap for people to step on, although that would be awesome.

So here’s my hope for what “interactive learning spaces” means: In your face, Apple Store!

Apple Stores manifest Apple’s leave-no-fingerprints consumerist ideal. Pure white, squeaky clean, and please do come try out the tools we’ve decided are appropriate for you inferior Earth creatures.

Google from the beginning has manifested itself as comfortable with the messy bustle of the Net, especially when the bustlers are hyper-geeky middle class Americans.

So, I’m hoping that the “interactive learning spaces” are places where you can not only get your email on a Chromebook keyboard, play a game on an Android tablet, and take a class in how to use Google Glass, but is a place where you can actually build stuff, learn from other “customers,” and hang out because the environment itself — not just the scheduled courses — is so stimulating and educational. Have hackathons there, let the community schedule classes and talks, make sure that Google engineers hang out there and maybe even some work there. Open bench everything!

That’s what I hope. I look forward to being disappointed.

September 2, 2013

Merging my Google Plus accounts

I made the mistake many years ago of creating a Google Accounts email address in addition to my existing Gmail account. Thus I have been plagued (granted, it’s an excellent example of a First World Problem plague) with two out of sync accounts.

Gmail works fine because “[email protected]” is my public-facing email address and has been since about 1994 when first I took the domain. (Yes, children, there was a time when you could register an existing word with all its vowels just by being the first to claim it.) When you send mail to that address, it shows up in my [email protected] Google Account. It also shows up at my [email protected] account, which now has 12,722 unread messages in it. Nevertheless, the “system” works for me.

But it does not work for me at, where I have two accounts that cannot be merged. I’ve tried.

And I thought it didn’t work at Google Plus. But recently I’ve been getting friend requests (or Circle requests, I guess) at G+ for [email protected], whereas my social network (such as it is) is at [email protected] Since I do very little with G+ anyway, it only bothers me because I hate rejecting friends’ requests, even though they’re trying to join a G+ that I don’t ever check and that currently has a total of 7 people in it. So, I googled for info, and found that Google Takeout promises to move my dweinberger Circles over to evident. Google seems quite serious about it: access is limited during the first 48 hours, the transfer takes up to 7 days, and you can only request one transfer every six months.

We’ll see how it works. In any case, I do appreciate the Google Data Liberation Front commitment.

And perhaps now my Circles will be unbroken.


