Joho the Blog » 2009 » April

April 30, 2009

New Zealand starts copyright from scratch

New Zealand has decided that trying to amend copyright for the digital age is like trying to adjust a horse’s carburetor. So, it’s going to start all over again.

That’s what we ought to do. Fresh piece of paper, a very big table, and an open bar. I don’t see any other way forward, really.

[Tags: ]


The syntax of retweeting

Joi Ito posts about whether we’ve agreed upon the syntax of retweeting: If I want to twitter one of your tweets and add my own comment, do I do it as “RT @you: your comment Me: My comment” or as “RT @you:your comment [Me: my comment]” or what? Of course, there was a bunch of twittering about this, which Joi captures.

It’s fun to watch syntax emerge. As Ethanz tweets: “Microformat development in 140 chars or less…”

[Tags: ]

Comments Off on The syntax of retweeting

Isenberg on what makes the Internet the Internet

David Isenberg has posted a transcript of his keynote at the Broadband Properties Summit that reminds us that the value of the Net comes not merely from its technical architecture but from the fact that that protocols that define that architecture are public. From the middle of the talk:

The Internet derives its disruptive quality from a very special property: IT IS PUBLIC. The core of the Internet is a body of simple, public agreements, called RFCs, that specify the structure of the Internet Protocol packet. These public agreements don’t need to be ratified or officially approved – they just need to be widely adopted and used.

The Internet’s component technologies – routing, storage, transmission, etc. – can be improved in private. But the Internet Protocol itself is hurt by private changes, because its very strength is its public-ness.

Because it is public, device makers, application makers, content providers and network providers can make stuff that works together. The result is completely unprecedented; instead of a special-purpose network – with telephone wires on telephone poles that connect telephones to telephone switches, or a cable network that connects TVs to content – we have the Internet, a network that connects any application – love letters, music lessons, credit card payments, doctor’s appointments, fantasy games – to any network – wired, wireless, twisted pair, coax, fiber, wi-fi, 3G, smoke signals, carrier pigeon, you name it. Automatically, no extra services needed. It just works.

This allows several emergent miracles…

[Tags: ]

Comments Off on Isenberg on what makes the Internet the Internet

Cass Sunstein: Fascist totalitarian or totalitarian fascist?

WorldNetDaily’s article about Cass Sunstein is laughably wrong and scarily partisan. Now that Obama has nominated Sunstein to head the White House Office of Information and Regulatory Affairs, the anonymous article in WND presents suggestions and ideas in Sunstein’s nuanced and clear-eyed works as if Sunstein were putting them forward as federal policy. The weirdest is Sunstein’s thought that it’d be useful if software could tell you that a message you’re about to send is a flame, and then keep you from pressing the send button hastily. In the hands of WND, this becomes Sunstein using the power of the federal government to mandate that it read all emails and block ones it doesn’t like.

I disagree with Sunstein on many points. In particular, Sunstein famously worries that the Internet is a causing us to harden our hearts and minds, enabling us to hear only from people with whom we agree. As an overall characterization, I think that misses too much else of what the Net is doing…even as I agree that the Net undoubtedly has that effect, too. I disagree with him, but I’m thrilled to have him in the Internet conversation. At the very least, his concern should remind us that the good things that the Net does and can do won’t happen automatically; we need to be vigilant and imaginative. At the most, he’s right and we need to heed him.

So, given that a concern about polarization has become the best-known piece of Sunstein’s powerful writings, the WorldNetDaily’s polarizing article can this morning reassure us that unintended irony remains the strongest force in the universe.

[Later that day: Julian Sanchez does a great job tracking down the quotes ‘n’ context (AKA The Truth.]

[Tags: ]


April 29, 2009

A 100 days hypothetical

Imagine if you will — and I know for some minority of you it will be a moment of pleasure, so please enjoy — that we are watching the pundits discuss and evaluate the first 100 days of the McCain-Palin administration…

[Minutes later] Jeremy Dibbell just pointed me to a Walter Shapiro article on just this hypothetical.

1 Comment »

The census device

Ethan Zuckerman has a fun, informative post about the device being used by Census takers. It delves into the particulars, it drives toward some broader conclusions.

So, beside the pleasure of the piece itself, consider it as a piece of journalism. No, Ethan doesn’t spend the time to track down every detail. But he tracks down the details that the weight of the topic deserves. And it is a piece that it is hard to image would have made it through the editorial filters of paper-based news media.

I understand that pointing to a good post by someone who is not a trained journalist (but who has expertise in a field) does not provide reassurance that the system of journalism will survive. That system covers fields systematically that the flood-the-field approach of the Web may not reach. But, still, we can allow ourselves some hope. Some cautious, vigilant hope.

Comments Off on The census device

Wolfram interview

The Berkman Center has posted the raw audio of my 55 minute interview with Stephen Wolfram, about his deeply cool WolframAlpha program (which he talked about here yesterday). On the other hand, if you wait a few days, you can skip some throat-clearing on my part, as well as my driving him down an alley based on my not seeing where WolframAlpha puts links to other pieces of information. As is so often the case, the edited version will be better.

[Tags: ]

Comments Off on Wolfram interview

April 28, 2009

[berkman] Stephen Wolfram –

Stephen Wolfram is giving at talk at Harvard/Berkman about his WolframAlpha site, which will launch in May. Aim: “Find a way to make computable the systematic knowledge we’ve accumulated.” The two big projects he’s worked on have made this possible. Mathematica (he’s worked on it for 23 yrs) makes it possible to do complex math and symbolic language manipulation. A New Kind of Science (NKS) has made it possible that it’s possible to understand much about the world computationally, often with very simple rules. So, WA uses NKS principles and the Mathematica engine. He says he’s in this project for the long term.

NOTE: Live-blogging.Posted without re-reading

Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

You type in a question and you get back in answers. You can type in math and get back plots, etc. Type in “gdp france” and get back the answer, a graph of the history of the shows histogram of GDP.

“GDP of france / italy”: The GDP of France divided by the GDP of Italy

“internet users in europe” shows histogram, list of highest and lowers, etc.

“Weather in Lexington, MA” “Weather lexington,ma 11/17/92” “Weather lexington, MA moscow” shows comparison of weather and location.

“5 miles/sec” returns useful conversions and comparisons.

“$17/hr” converts to per week, per month, etc., plus conversion to other currencies.

“4000 words” gives a list of typical typing speeds, the length in characters, etc.

“333 gm gold” gives the mass, the commodity price, the heat capacity, etc.

“H2S04” gives an illustration of the molecule, as well as the expected info about mass, etc.

“Caffeine mol wt/ water” gives a result of moelcular weights divided.

“decane 2 atm 50 C” shows what decane is like at two atmospheres and at 50 C, e.g., phase, density, boiling point, etc.

“LDL 180”: Where your cholesterol level is against the rest of the population.

“life expctancy male age 40 italy”: distribution of survival curve, history of that life expectancy over time. Add “1933” and adds specificity.

“5’8″ 160 lbs”: Where in the distribution of body mass index

“ATTGTATACTAA”: Where that sequence matches the human genome

“MSFT”: Real time Microsoft quote and other financial performance info. “MSFT sun” assumes that “sun” refers to stock info about Sun Microsystems.

“ARM 20 yr mortgage”: payment of monthly tables, etc. Let’s you input the loan amount.

“D# minor”: Musical notation, plays the D# minor scale

“red + yellow”: Color swatch, html notation

“”: Info about Apple, history of page views

“lawyers”: Number employed, average wage

“France fish production”: How many metric tons produced, pounds per second, which is 1/5 the rate trash is produced in NYC

“france fish production vs. poland”: charts and diagrams

“2 c orange juice”: nutritional info

“2 c orange juice + 1 slice cheddar cheese”: nutritional label

“a__a__n”: English words that match

“alan turing kurt godel”: Table of info about them

“weather princeton, day when kurt godel died”: the answer

“uncle’s uncle’s grandson’s grandson”: family tree, probabiilty of those two sharing genetic material

“5th largest country in europe”

“gdp vs. railway length in europe”:

“hurricane andrew”: Data, map

“andrew”: Popularity of the name, diagrammed.

“president of brazil in 1922”

“tide NYC 11/5/2015”

“ten flips 4 heads”: probability

“3,7,15,31,63…”: Figures out and plots next in the sequence and possible generating function

“4,1 knot”: diagram of knot

“next total solar eclipse chicago”: Next one visible in Chicago

“ISS”: International Space Station info and map

It lets you select alternatives in case of ambiguities.

“We’re trying to compute things.” We have tools that let us find things. But when you have a particular question, it’s unlikely that you’ll find that specific answer written down. WA therefore tries to compute answers. “The objective is to reach expert level knowledge across a very wide range of domains.”

Four big pieces to WA:

1. Data curation. WA has trillions of people of curated data. It gets it from free data or licensed data. Partially human partially automated system cleans it up and tries to correlate it. “A lot can be done automatically…At some point, you need a human domain expert in the middle of it.” There are people inside the company and a network of others who do the curation.

2. The algorithms. Take equations, etc., from all over. “There are finite numbers of methods that have been discovered in the history of science.” There are 5-6 millions lines of Mathematica code at work.

3. Linguistic analysis to understand the inputs. “There’s no manual, no documentation. You get to interact it with just how you think about things.” They’re doing the opposite of natural language processing which usually tries to understand millions of pages. WA’s problem is mapping a relatively small set of short human inputs to what the system knows about. NKS helps with this. It turns out that ambiguity is not nearly as big a problem as we thought.

4. Automated presentation. What do yo show people so they can cognitively grasp it? “Algorithmic presentation technology … tries to pick out what is important.” Mathematica has worked on “computational aesthetics” for years.

He says that have at least a reasonable start on about 90% of the shelves in a typical reference library.

Q: (andy orem) What do you do about the inconsistencies of data? We don’t know how inconsistent it was and what algorithms you used.
A: We give source info. “We’re trying to create an authoritative source for data.” We know about ranges of values; we’ll make that information available. “But by the time you have a lot of footnotes on a number, there’s not a lot you can do with that number.” “We do try to give footnotes.”

Q: How do you keep current?
A: Lots of people want to make their data available. We hope to make a streamlined, formalized way for people to contribute the data. We want to curate it so we can stand by it.

Q: [me] Openness? Of API, of metadata, of contributions of interesting comparisons, etc.
A: We’ll do a variety of levels of API. First: presentation level: put output on their pages. Second, XML-level so people can mash it up. Third level: individual results from the databases and from the computations. [He shows a first draft of the api] You can get as the symbolic expressions that Mathematica is based on. We hope to have a personalizable version. Metadata: When we open up our data repository mechanisms so people can contribute, some of our ontology will be exposed.

How about in areas where people disagree? If a new universe model comes out from Stanford, does someone at WolframAlpha have to say yes and put it in?
A: Yes
Q: How many people?
A: It’s been 150 for a long time. Now it’s 250. It’s probably going to be a thousand people.

Q: Who is this for?
A: It’s for expert knowledge for anyone who needs it.

Q: Business model?
A: The site will be free. Corporate sponsors will put ads on the side. We’re trying to figure out how to ingest vendor info when it’s relevant, and how to present it on the site. There will also be a professional version for people who are doing a lot of computation, want to put in their own data…

Q: Can you define the medical and population databases to get the total mass of people in England.
A: We could integrate those databases, but we don’t have that now. We’re working on “splat pages” you get when it doesn’t work. It should tell you what it does know.

Q: What happens when there is no answer, e.g., 55th largest state in the US?
A: It says it doesn’t know.

Q: [eszter] For some data, there are agreed-upon sources. For some there aren’t. How do you choose sources?
A: That’s a key problem in doing data curation. “How do we do it? We try to do the best job we can.” Use experts. Assess. Compare. [This is a bigger issue than Wolfram apparently thinks where data models are political. E.g., Eszter Hargittai, who is sitting next to me, points out “How many Internet users are there?” is a highly controversial question.] We give info about what our sources are.

Q: Technologically, where do you want to focus in the future?
A: All 4 areas need to be pushed forward.

Q: How does this compare to the Semantic Web?
A: Had the Web already had been semantically tagged, this product would have been far far easier, although keep in mind that much of the data in WA comes from private databases. We have a sophisticated ontology. We didn’t create the ontology top-down. It’s mostly bottom-up. We have domains. We have ontologies for them. We merge them together. “I hope as we expose some of our data repository methods, it will make it easier to do some Semantic Web kind of things. People will be able to line data up.”

Q: When can we look at the formal specifications of these ontologies? When can we inject our own?
A: It’s all represented in clean Mathematica code. Knitting new knowledge into the system is tricky because our UI is natural language, which is messy. E.g., “There’s a chap who goes by the name Fifty Cent.” You have to be careful.

Q: What reference source tells you if Palestine exists…?
A: In cases like this, we say “Assuming Case A or B.” There are holes in the data. I’m hoping people will be motivated to fill them in. Then there’s the question of the extent to which we can build expert communities. We don’t know the best way to do this. Lots of interesting ideas.

How about pop culture?
A: Pop culture info is much shallower computationally. (“Britney Spears” just gets her name, birthdate, and birthplace. No music, no photos, nothing about her genre, etc.) (“Meaning of life” does answer “42”)

Q: Compare with CYC? (A common sense reasoning system)
A: CYC deals with human reasoning. That’s not the best method for figuring out physics, etc. “We can do the non-human parts of reasoning really well.”

Q: [couldn’t hear the question]
A: The best way to debug it is not necessarily to inspect the code but to inspect the results. People reading code is less efficient than automated systems.

Q: Will it be integrated into Mathematica?
A: A future version will let you type WA data into Mathematica.

Q: How much work do you have to do on the NLP sound? Your searches used a special lexicon…
A: We don’t know. We have a daily splat call to see what types of queries have failed. We’re pretty good at removing linguistic fluff. People drop the fluff pretty quickly after they’ve been using WA for a while.

Q: (free software foundation) How does this change the landscape for open access? There’s info in commercial journals…
A: When there’s a proprietary database, the challenge is making the right deals. People will not be able to take out of our system all the data that we put into it. We have yet to learn all of the issues that will come up.

Q: Privacy?
A: We’re dealing with public data. We could do people search, but, personally, I don’t want to.

Q: What would you think of a more Wikipedia-like model? Do you worry about a competitor making a wiki data that is completely open and grows faster?
A: That’d be great. Making WA is hard. It’s not just a matter of shoveling data in. Wikipedia is fantastic and I use it all the time, but it’s gone in particular directions. When you’re looking for systematic data there, even if people put in systematic data — e.g., 300 pages about chemicals — over the course of time, the data gets dirty. You can’t compute from it.

Q: How about if Google starts presenting your results in response to queries?
A: We’re looking for synergies But we’re generating these on the fly; it won’t get indexed.

Q: I wonder how universities will find a place for this.
A: Very interesting question. Generating hard data is hard and useful, although universities often prefer higher levels of synthesis and opinion. [Loose paraphrase!] Leibniz had this nailed: Take any human argument and find a way to mechanically compute it. [Tags: ]


[berkman] Russ Neuman

W. Russell Neuman is giving a Berkman talked called “Theories of Media Evolution.” He wants to think about the effect of the Internet in the context of the history of other media and the difference they’ve made. (The book “Theories of Media Evolution” will be out this fall.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

There are four classic dismissive argumenst about the Internet: 1. It’s just another communication tech. 2. Human psychology doesn’t change. (Russ thinks we should take that seriously.) 3. “Just you wait.” “The iron laws of political economy just aren’t going to change.” 4. If you say the Net will make a difference, you are a naive techno-determinist.

Russ says his current research was inspired by his own mentor, Ithiel de Sola Pool. The volume of communication is growing exponentially. His conclusion: The volume of info changes the media from push to pull.

He shows a famous graph of volumes and cost in communication, 1960-1977. Volume goes up and cost goes down exponentially. Russ has done a new study, 1980-2005, studying the number of words per medium per day going into the average American home. How many newspapers, how many words in each, etc. Russ looked at 12 traditional media and the Internet. And he shifted from words to minutes as the unit of measurement. He includes CDs, video games, etc. He finds that the average newspaper consumption has gone from 16 mins/day to 6.5mins/day. But most of the charts are climbing at a tremendous rate. E.g., Average American home spends 1.10 hr/day on the Net (which includes houses with no access). The total media supply to the home has gone up exponentially. The total media consumed in minutes per day is a much shallower curve, from about 600 mins per day in 1960 to about 1,000 mins per day, in part because homes now have more TV sets and portable media. The growth of media supply consumption grew very slowly 1960-1980, but has gone up from single digits (minutes per day) to over 20,000. The ratio was 98:1 — if you read every book and watched every minute of TV available, the ratio of supply to consumption was 98:1. “That was a human metric. You can deal with that.” But in 2005, it’s 20943:1, which is not a human level of metric. “And this counts the Internet as one. You should count the number of pages available to you” which is somewhere north of 8.5 billion.

So, he says, we need the help of machines with their algorithms and socially-based recommendation systems. Search is incredibly important in this world of super-abundance, he says. It will help us to think about the new media in the context of the old, he says.

Q: You think volume is the fundamental factor. You’re saying the volume changes the nature of media from push to pull. Maybe it’s volume + technical affordance. If you look at satellite TV or cable TV, those didn’t fundamentally change the nature of the medium as the volume went up.
A: “Affordance” makes the “you’re a techno-determinist” criticism go away, because it says that technology isn’t determinative but it does have capabilities that people can take up. As far as the first part of the question that said “Isn’t it more complicated than…,” the answer is always yes. About regulation: The argument for regulation was spectrum scarcity, which is why we don’t regulate the print medium. Ironically, we got one newspaper in a market but dozens of broadcasters.
The shape of media could come from a number of places, but it’s going to come from Google.

Q: How about the number of words going out from households?
A: 900,000 bloggers (US only)… One of the questions is: What are the topics?
Q: The Berkman MediaCloud project should help address that in a rigorous way.

Q: Pool left out data like the phone book and the home encyclopedia.
A: There’s a psychological analytic (cf. Todd Gittlin) of info overload: people panic and withdraw when faced with this much info. But people who entered a library weren’t intimidated by it. That seems to be the case with the Web.

Q: [yochai benkler] I’m surprised that you predict it’d take me longer to view all the movies than read all of the Net. You’re masking the actual size of the increase in media access…
A: Yes. I had to mask in order to make the other sources visible, so I counted it as one channel. But it makes my point even more strongly…
Q: When you construe the Net as a flow of info that a human has to parse, you get your way of approaching the problem: We have to rely on Google or a friend. But that masks what’s going on. We’re producing. We have to construct our own social environments. It’s not just push to pull, but also read to write. (Push to pull are read categories.) And the question of power depends on whether the machine is impervious to workarounds. The only tv broadcasters could not be worked around. But on the Net I can find others with related interests.”
A: Important questions. Let me bring out some points I didn’t make in my talk. It costs $3M/hr for TV. $16M/hr for a motion picture. We’ve developed historically a metric that people are willing to pay, say, $10 to see a movie, and that’s split 50:50 between the distributor and the motion picture company. They make $5/hr, whereas on TV the revenue is about $0.60/hr (commercials). Google News is repurposing independent professional journalism; if a competitive search engine started doing independent investigative journalism, and Google would do the same. [Sorry for the choppiness]

Q: The Internet is all about entertainment. People are reading fewer books.
A: You revealed your presumption when you said that books are hard to read and are good for you, while Internet is easy and not good for you. Where is the evidence that reading Shakespearean sonnets makes you a better person?

Q: You could argue, marxistly, that mass production changes how people interact with their environments. What’s the parallel of this and the mass production of consumer goods?
A: Alienation theory? When Marx got paid, rarely, he got paid as an independent investigative journalist. The Net makes it easier to find unalienated work (made by craftspeople who is not alienated from the product of her labor).

A: There are so many research questions that these technology afford that we should have our research budgets doubled.

[Tags: ]

Comments Off on [berkman] Russ Neuman

Plagiarism and Fair Use

Afroditi Theodoridou at IP Osgoode does an excellent job explaining the application of Fair Use in the Turnitin Plagiarism Detection Service suit. This post not only makes the decision clearer, it also lays out the legal nature of Fair Use.

TurnItIn lets a teacher submit a paper to see if it’s original or plagiarized. The service keeps a database of the papers submitted for checking. Some students sued, claiming that was an infringement of their copyright. The court decided that TurnItIn was covered by Fair Use. Afroditi concludes with the interesting claim that fighting plagiarism advances the Framers’ original motivation for creating copyright.

[Tags: ]

1 Comment »

Next Page »

Creative Commons License
Joho the Blog by David Weinberger is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

Creative Commons license: Share it freely, but attribute it to me, and don't use it commercially without my permission.

Joho the Blog gratefully uses WordPress blogging software.
Thank you, WordPress!