logo
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

May 7, 2009

Wolfram podcast

My interview with Stephen Wolfram about WolframAlpha is now available. Some other me-based resources:

The unedited version weighs in at a full 55 minutes. The edited version will spare you some of my throat-clearing, and some dumb questions.

A post about what I think the significance of WolframAlpha will be.

Live blog of Wolfram’s presentation at Harvard.

Wolfram’s presentation at Harvard.

[Tags: wolfram wolframalpha search google metadata science ]

Tweet
Follow me

Categories: Uncategorized Tagged with: education • everythingIsMiscellaneous • expertise • google • knowledge • metadata • science • search • taxonomy • web 2.0 • wolfram • wolframalpha Date: May 7th, 2009 dw

Be the first to comment »

May 6, 2009

Evidence-based journalism

Richard Sambrook, director of the BBC‘s World Service and Global News, has posted an excellent engagement with Jay Rosen’s piece on He Said/ She Said journalism. He agrees that that type of journalism is a problem, but the problem isn’t with the He Said/She Said format. The problem is lazy journalism, says Richard. He points to some cases where we want a juxtaposing of views, which I’m sure Jay agrees with. Richard says his real concern is that some may take Jay’s piece as license to simply spout off. He writes:

Evidence-based reporting, the basis of objectivity (as distinct from impartiality) is in retreat and needs to be bolstered. He Said, She Said started life a hundred years ago as a journalistic discipline to counter yellow-journalism as Pulitzer and others tried to establish a degree of civic responsiblity in the press. It may have run its course but there are many who simply favour journalism of opinion – under the cloak of “calling the story”. I maintain we need evidence, fact-based reporting more than ever in a world awash with information rumour and opinion. That sometimes calls for a journalism of restraint – in which the New York Times (and the BBC) has an honourable tradition.

Evidence-based is a nice way of cutting through the argument about objectivity’s corrupt philosophical underpinnings. Of course, people are going to argue about what counts as evidence and what the evidence means — that is, I disagree with the implication of Richard’s blog’s tagline — “Everyone is entitled to his own opinion but not to his own facts,” Daniel Patrick Moynihan — but evidence is an important term not used often enough in these discussions. Evidence provides a way to disagree that can progress towards truth, or at least towards agreement, or at a minimum, an understanding of where the actual disagreement lies.

Of course, I offer this opinion without any evidence :).

[Tags: journalism objectivity evidence media ]

Tweet
Follow me

Categories: Uncategorized Tagged with: evidence • expertise • journalism • knowledge • media • objectivity Date: May 6th, 2009 dw

Be the first to comment »

May 4, 2009

How important is WolframAlpha?

The Independent calls WolframAlpha “An invention that could change the Internet forever.” It concludes: “Wolfram Alpha has the potential to become one of the biggest names on the planet.”

Nova Spivak, a smart Semantic Web guy, says it could be as important as Google.

Ton Zijlstra, on the other hand, who knows a thing or two about knowledge and knowledge management, feels like it’s been overhyped. After seeing the video of Wolfram talking at Harvard, Ton writes:

No crawling? Centralized database, adding data from partners? Manual updating? Adding is tricky? Manually adding metadata (curating)? For all its coolness on the front of WolframAlpha, on the back end this sounds like it’s the mechanical turk of the semantic web.

(“The mechanical turk of the semantic web.” Great phrase. And while I’m in parentheses, ReadWriteWeb has useful screenshots of WolframAlpha, and here’s my unedited 55-minute interview with Wolfram.)

I am somewhere in between, definitely over in the Enthusiastic half of the field. I think WolframAlpha [WA] will become a standard part of the Internet’s tool set, but is not transformative.

WA works because it’s curated. Real human beings decide what topics to include (geography but not 6 Degrees of Courtney Love), which data to ingest, what metadata is worth capturing, how that metadata is interrelated (= an ontology), which correlations to present to the user when she queries it (daily tonnage of fish captured by the French compared to daily production of garbage in NYC), and how that information should be presented. Wolfram insists that an expert be present in each data stream to ensure the quality of the data. Given all that human intervention, WA then performs its algorithmic computations … which are themselves curated. WA is as curated as an almanac.

Curation is a source of its strength. It increases the reliability of the information, it enables the computations, and it lets the results pages present interesting and relevant information far beyond the simple factual answer to the question. The richness of those pages will be big factor in the site’s success.

Curation is also WA’s limitation. If it stays purely curated, without areas in which the Big Anyone can contribute, it won’t be able to grow at Internet speeds. Someone with a good idea — provide info on meds and interactions, or add recipes so ingredients can be mashed up with nutritional and ecological info — will have to suggest it to WolframAlpha, Inc. and hope they take it up. (You could to this sorta kinda through the API, but not get the scaling effects of actually adding data to the system.) And WA will suffer from the perspectival problems inevitable in all curated systems: WA reflects Stephen Wolfram’s interests and perspective. It covers what he thinks is interesting. It covers it from his point of view. It will have to make decisions on topics for which there are no good answers: Is Pluto a planet? Does Scientology go on the list of religions? Does the page on rabbits include nutritional information about rabbit meat? (That, by the way, was Wolfram’s example in my interview of him. If you look at the site from Europe, a “rabbit” query does include the nutritional info, but not if you log in from a US IP address.) But WA doesn’t have to scale up to Internet Supersize to be supersized useful.

So, given those strengths and limitations, how important is WA?

Once people figure out what types of questions it’s good at, I think it will become a standard part of our tools, and for some areas of inquiry, it may be indispensable. I don’t know those areas well enough to give an example that will hold up, but I can imagine WA becoming the first place geneticists go when they have a question about a gene sequence or chemists who want to know about a molecule. I think it is likely to be so useful within particular fields that it becomes the standard place to look first…Like IMDB.com for movies, except for broad, multiple fields, with the ability to cross-compute.

But more broadly, is WA the next Google? Does it transform the Internet?

I don’t think so. Its computational abilities mean it does something not currently done (or not done well enough for a crowd of users), and the aesthetics of its responses make it quite accessible. But how many computational questions do you have a day? If you want to know how many tons of fish France catches, WA will work as an almanac. But that’s not transformational. If you want to know how many tons divided by the average weight of a French person, WA is for you. But the computational uses that are distinctive of WA and for which WA will frequently be an astounding tool are not frequent enough for WA to be transformational on the order of a Google or Wikipedia.

There are at least two other ways it could be transformational, however.

First, its biggest effect may be on metadata. If WA takes off, as I suspect it will, people and organizations will want to get their data into it. But to contribute their data, they will have to put it into WA’s metadata schema. Those schema then become a standard way we organize data. WA could be the killer app of the Semantic Web … the app that gives people both a motive for putting their data into ontologies and a standardized set of ontologies that makes it easy to do so.

Second, a robust computational engine with access to a very wide array of data is a new idea on the Internet. (Ok, nothing is new. But WA is going to bring this idea to mainstream awareness.) That transforms our expectations, just as Wikipedia is important not just because it’s a great encyclopedia but because it proved the power of collaborative crowds. But, WA’s lesson — there’s more that can be computed than we ever imagined — isn’t as counter-intuitive as Wikipedia’s, so it is not as apple-cart-upsetting, so it’s not as transformational. Our cultural reaction to Wikipedia is to be amazed by what we’ve done. With WA, we are likely to be amazed by what Wolfram has done.

That is the final reason why I think WA is not likely to be as big a deal as Google or Wikipedia, and I say this while being enthusiastic — wowed, even — about WA. WA’s big benefit is that it answers questions authoritatively. WA nails facts down. (Please take the discussion about facts in a postmodern age into the comments section. Thank you.) It thus ends conversation. Google and Wikipedia aim at continuing and even provoking conversation. They are rich with links and pointers. Even as Wikipedia provides a narrative that it hopes is reliable, it takes every opportunity to get you to go to a new page. WA does have links — including links to Wikipedia — but most are hidden one click below the surface. So, the distinction I’m drawing is far from absolute. Nevertheless, it seems right to me: WA is designed to get you out of a state of doubt by showing you a simple, accurate, reliable, true answer to your question. That’s an important service, but answers can be dead-ends on the Web: you get your answer and get off. WA as question-answerer bookends WA’s curated creation process: A relatively (not totally) closed process that has a great deal of value, but keeps it from the participatory model that generally has had the biggest effects on the Net.

Providing solid, reliable answers to difficult questions is hugely valuable. WolframAlpha’s approach is ambitious and brilliant. WolframAlpha is a genius. But that’s not enough to fundamentally alter the Net.

Nevertheless, I am wowed.[Tags: wolfram wolframalpha wikipedia google search metadata semantic_web ]

Tweet
Follow me

Categories: Uncategorized Tagged with: digital culture • education • everythingIsMiscellaneous • expertise • google • infohistory • knowledge • libraries • metadata • science • search • web 2.0 • wikipedia • wolfram • wolframalpha Date: May 4th, 2009 dw

19 Comments »

April 29, 2009

Wolfram interview

The Berkman Center has posted the raw audio of my 55 minute interview with Stephen Wolfram, about his deeply cool WolframAlpha program (which he talked about here yesterday). On the other hand, if you wait a few days, you can skip some throat-clearing on my part, as well as my driving him down an alley based on my not seeing where WolframAlpha puts links to other pieces of information. As is so often the case, the edited version will be better.

[Tags: wolfram wolframalpha metadata search google semantic_web ontologies taxonomy everything_is_miscellaneous ]

Tweet
Follow me

Categories: Uncategorized Tagged with: everythingIsMiscellaneous • everything_is_miscellaneous • expertise • google • knowledge • libraries • metadata • ontologies • podcasts • search • semantic_web • taxonomy • wolfram • wolframalpha Date: April 29th, 2009 dw

Be the first to comment »

April 28, 2009

[berkman] Stephen Wolfram – WolframAlpha.com

Stephen Wolfram is giving at talk at Harvard/Berkman about his WolframAlpha site, which will launch in May. Aim: “Find a way to make computable the systematic knowledge we’ve accumulated.” The two big projects he’s worked on have made this possible. Mathematica (he’s worked on it for 23 yrs) makes it possible to do complex math and symbolic language manipulation. A New Kind of Science (NKS) has made it possible that it’s possible to understand much about the world computationally, often with very simple rules. So, WA uses NKS principles and the Mathematica engine. He says he’s in this project for the long term.

NOTE: Live-blogging.Posted without re-reading

Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

You type in a question and you get back in answers. You can type in math and get back plots, etc. Type in “gdp france” and get back the answer, a graph of the history of the shows histogram of GDP.

“GDP of france / italy”: The GDP of France divided by the GDP of Italy

“internet users in europe” shows histogram, list of highest and lowers, etc.

“Weather in Lexington, MA” “Weather lexington,ma 11/17/92” “Weather lexington, MA moscow” shows comparison of weather and location.

“5 miles/sec” returns useful conversions and comparisons.

“$17/hr” converts to per week, per month, etc., plus conversion to other currencies.

“4000 words” gives a list of typical typing speeds, the length in characters, etc.

“333 gm gold” gives the mass, the commodity price, the heat capacity, etc.

“H2S04” gives an illustration of the molecule, as well as the expected info about mass, etc.

“Caffeine mol wt/ water” gives a result of moelcular weights divided.

“decane 2 atm 50 C” shows what decane is like at two atmospheres and at 50 C, e.g., phase, density, boiling point, etc.

“LDL 180”: Where your cholesterol level is against the rest of the population.

“life expctancy male age 40 italy”: distribution of survival curve, history of that life expectancy over time. Add “1933” and adds specificity.

“5’8″ 160 lbs”: Where in the distribution of body mass index

“ATTGTATACTAA”: Where that sequence matches the human genome

“MSFT”: Real time Microsoft quote and other financial performance info. “MSFT sun” assumes that “sun” refers to stock info about Sun Microsystems.

“ARM 20 yr mortgage”: payment of monthly tables, etc. Let’s you input the loan amount.

“D# minor”: Musical notation, plays the D# minor scale

“red + yellow”: Color swatch, html notation

“www.apple.com”: Info about Apple, history of page views

“lawyers”: Number employed, average wage

“France fish production”: How many metric tons produced, pounds per second, which is 1/5 the rate trash is produced in NYC

“france fish production vs. poland”: charts and diagrams

“2 c orange juice”: nutritional info

“2 c orange juice + 1 slice cheddar cheese”: nutritional label

“a__a__n”: English words that match

“alan turing kurt godel”: Table of info about them

“weather princeton, day when kurt godel died”: the answer

“uncle’s uncle’s grandson’s grandson”: family tree, probabiilty of those two sharing genetic material

“5th largest country in europe”

“gdp vs. railway length in europe”:

“hurricane andrew”: Data, map

“andrew”: Popularity of the name, diagrammed.

“president of brazil in 1922”

“tide NYC 11/5/2015”

“ten flips 4 heads”: probability

“3,7,15,31,63…”: Figures out and plots next in the sequence and possible generating function

“4,1 knot”: diagram of knot

“next total solar eclipse chicago”: Next one visible in Chicago

“ISS”: International Space Station info and map

It lets you select alternatives in case of ambiguities.

“We’re trying to compute things.” We have tools that let us find things. But when you have a particular question, it’s unlikely that you’ll find that specific answer written down. WA therefore tries to compute answers. “The objective is to reach expert level knowledge across a very wide range of domains.”

Four big pieces to WA:

1. Data curation. WA has trillions of people of curated data. It gets it from free data or licensed data. Partially human partially automated system cleans it up and tries to correlate it. “A lot can be done automatically…At some point, you need a human domain expert in the middle of it.” There are people inside the company and a network of others who do the curation.

2. The algorithms. Take equations, etc., from all over. “There are finite numbers of methods that have been discovered in the history of science.” There are 5-6 millions lines of Mathematica code at work.

3. Linguistic analysis to understand the inputs. “There’s no manual, no documentation. You get to interact it with just how you think about things.” They’re doing the opposite of natural language processing which usually tries to understand millions of pages. WA’s problem is mapping a relatively small set of short human inputs to what the system knows about. NKS helps with this. It turns out that ambiguity is not nearly as big a problem as we thought.

4. Automated presentation. What do yo show people so they can cognitively grasp it? “Algorithmic presentation technology … tries to pick out what is important.” Mathematica has worked on “computational aesthetics” for years.

He says that have at least a reasonable start on about 90% of the shelves in a typical reference library.

Q: (andy orem) What do you do about the inconsistencies of data? We don’t know how inconsistent it was and what algorithms you used.
A: We give source info. “We’re trying to create an authoritative source for data.” We know about ranges of values; we’ll make that information available. “But by the time you have a lot of footnotes on a number, there’s not a lot you can do with that number.” “We do try to give footnotes.”

Q: How do you keep current?
A: Lots of people want to make their data available. We hope to make a streamlined, formalized way for people to contribute the data. We want to curate it so we can stand by it.

Q: [me] Openness? Of API, of metadata, of contributions of interesting comparisons, etc.
A: We’ll do a variety of levels of API. First: presentation level: put output on their pages. Second, XML-level so people can mash it up. Third level: individual results from the databases and from the computations. [He shows a first draft of the api] You can get as the symbolic expressions that Mathematica is based on. We hope to have a personalizable version. Metadata: When we open up our data repository mechanisms so people can contribute, some of our ontology will be exposed.

How about in areas where people disagree? If a new universe model comes out from Stanford, does someone at WolframAlpha have to say yes and put it in?
A: Yes
Q: How many people?
A: It’s been 150 for a long time. Now it’s 250. It’s probably going to be a thousand people.

Q: Who is this for?
A: It’s for expert knowledge for anyone who needs it.

Q: Business model?
A: The site will be free. Corporate sponsors will put ads on the side. We’re trying to figure out how to ingest vendor info when it’s relevant, and how to present it on the site. There will also be a professional version for people who are doing a lot of computation, want to put in their own data…

Q: Can you define the medical and population databases to get the total mass of people in England.
A: We could integrate those databases, but we don’t have that now. We’re working on “splat pages” you get when it doesn’t work. It should tell you what it does know.

Q: What happens when there is no answer, e.g., 55th largest state in the US?
A: It says it doesn’t know.

Q: [eszter] For some data, there are agreed-upon sources. For some there aren’t. How do you choose sources?
A: That’s a key problem in doing data curation. “How do we do it? We try to do the best job we can.” Use experts. Assess. Compare. [This is a bigger issue than Wolfram apparently thinks where data models are political. E.g., Eszter Hargittai, who is sitting next to me, points out “How many Internet users are there?” is a highly controversial question.] We give info about what our sources are.

Q: Technologically, where do you want to focus in the future?
A: All 4 areas need to be pushed forward.

Q: How does this compare to the Semantic Web?
A: Had the Web already had been semantically tagged, this product would have been far far easier, although keep in mind that much of the data in WA comes from private databases. We have a sophisticated ontology. We didn’t create the ontology top-down. It’s mostly bottom-up. We have domains. We have ontologies for them. We merge them together. “I hope as we expose some of our data repository methods, it will make it easier to do some Semantic Web kind of things. People will be able to line data up.”

Q: When can we look at the formal specifications of these ontologies? When can we inject our own?
A: It’s all represented in clean Mathematica code. Knitting new knowledge into the system is tricky because our UI is natural language, which is messy. E.g., “There’s a chap who goes by the name Fifty Cent.” You have to be careful.

Q: What reference source tells you if Palestine exists…?
A: In cases like this, we say “Assuming Case A or B.” There are holes in the data. I’m hoping people will be motivated to fill them in. Then there’s the question of the extent to which we can build expert communities. We don’t know the best way to do this. Lots of interesting ideas.

How about pop culture?
A: Pop culture info is much shallower computationally. (“Britney Spears” just gets her name, birthdate, and birthplace. No music, no photos, nothing about her genre, etc.) (“Meaning of life” does answer “42”)

Q: Compare with CYC? (A common sense reasoning system)
A: CYC deals with human reasoning. That’s not the best method for figuring out physics, etc. “We can do the non-human parts of reasoning really well.”

Q: [couldn’t hear the question]
A: The best way to debug it is not necessarily to inspect the code but to inspect the results. People reading code is less efficient than automated systems.

Q: Will it be integrated into Mathematica?
A: A future version will let you type WA data into Mathematica.

Q: How much work do you have to do on the NLP sound? Your searches used a special lexicon…
A: We don’t know. We have a daily splat call to see what types of queries have failed. We’re pretty good at removing linguistic fluff. People drop the fluff pretty quickly after they’ve been using WA for a while.

Q: (free software foundation) How does this change the landscape for open access? There’s info in commercial journals…
A: When there’s a proprietary database, the challenge is making the right deals. People will not be able to take out of our system all the data that we put into it. We have yet to learn all of the issues that will come up.

Q: Privacy?
A: We’re dealing with public data. We could do people search, but, personally, I don’t want to.

Q: What would you think of a more Wikipedia-like model? Do you worry about a competitor making a wiki data that is completely open and grows faster?
A: That’d be great. Making WA is hard. It’s not just a matter of shoveling data in. Wikipedia is fantastic and I use it all the time, but it’s gone in particular directions. When you’re looking for systematic data there, even if people put in systematic data — e.g., 300 pages about chemicals — over the course of time, the data gets dirty. You can’t compute from it.

Q: How about if Google starts presenting your results in response to queries?
A: We’re looking for synergies But we’re generating these on the fly; it won’t get indexed.

canadian pharmacy viagra | cialis online

Q: I wonder how universities will find a place for this.
A: Very interesting question. Generating hard data is hard and useful, although universities often prefer higher levels of synthesis and opinion. [Loose paraphrase!] Leibniz had this nailed: Take any human argument and find a way to mechanically compute it. [Tags: wolfram wolframapha search expertise google metadata libraries ]

Tweet
Follow me

Categories: misc Tagged with: education • everythingIsMiscellaneous • expertise • google • infohistory • knowledge • libraries • metadata • philosophy • science • search • taxonomy • wolfram • wolframapha Date: April 28th, 2009 dw

11 Comments »

April 22, 2009

Transparency of zombies

I really like the fact that when Left 4 Dead (the great cooperative zombie killing game) introduced a new type of gameplay, one of the developers explained the math behind the balancing of the waves of incoming undead.

[Tags: games left_4_dead zombies transparency blogging expertise ]

Tweet
Follow me

Categories: blogs Tagged with: blogging • blogs • entertainment • expertise • games • transparency • zombies Date: April 22nd, 2009 dw

3 Comments »

April 17, 2009

[ugc3] Sustainable business models and long tails

Andres Hervas-Drane
Begins by noting the long tail in the market share of products. There’s empirical evidence that this is happening online. Why there? Standard answer: Supply side. But he wants to look at factors on the demand side that can affect this distribution.

He sets up a case where consumers have difference preferences and come to the market uninformed. In the offline world, search happens through word of mouth. They can search with evaluations or with recommendations. Recommendations come from consumers who searched with evaluations. Word of mouth results in a high concentration of sales.

Almost a third of Amazon’s sales are generated by recommendations. These are generated by users as meta-content, finding consumers who have similar preferences. This is taste-matching and it reduces sales concentration.

Then there are “artistic markets”: that increase the demand for niche producers and results in long term cultural variety.


Peyman Faratin talks about a case study of prediction markets. His main point: Scarcity is at play even in the UGC system. The new scarcity is of attention.

An incentive engineering problem is at foot in prediction markets. When you can’t bet real money, the incentives go down. The reward streams are delayed. You have to search for the market. There are significant transaction costs [which he goes over in some detail, but too hard to capture briefly…sorry]. That’s why prediction markets aren’t going very well; they’re lonely.

Solution: Reward the big hitters. Let them transfer their reputations. Give them content management rights. Rank markets and reputations. “Invisible hand of the algorithm: Recommendations.” Use widgets to let the market come to the user. [I missed the end of this. Sorry!]


Chris Derllarocas talksabout “Your Operations hvae become your New Marketing.” “Every customer is a potential brand ambassador or a lethal bran assassin.” E.g., in 2006, Comcast spent $100 M in advertising, wiped out by the youtube of a sleeping technician. UGC can make or break your business.

Most influential UGC occurs spontaneously and represents non-representative experiences. Companies need to take preventive measures. Consumers use UGC to decide if they should consume a product. Once they have, they decide what to report. Companies need to “Strategically re-engineer the consumption experience to spontaneously provoke the right mix of consumer content.”

Rules: Pay attention to extreme events. Move towards a culture that pays attention to outliers, positive and negative. “Redesign your monitoring practices and career incentives to accentuate the positive and eliminate the negative.” Also, “reasses yesterday’s yield management practices.” That is, make sure you do not systematically produce a small number of unhappy customers” (e.g., but routinely overbooking, or by routinely selling undesirable hotel rooms at very low rates). Also, get to know your power customers, i.e., the ones more likely to be vocal. They should receive “the special teratment that loyal big spenders used to receive ten years ago.” Also, not sock puppetry. Also, maybe have a Chief Perception Officer.

Q: You’re proposing an operational hit since we won’t be selling all the seats or rooms.
A: Yes. That’s the decision to be made. We need to make these decisions holistically. We don’t have the complete answer, There’s room for innovation.

Q: [me] This morning we heard that the population is not nearly as adept at using these tools as some of us (= me) would like to believe. This afternoon we hear about markets that are adept. How did you hear this morning’s research?
A: It varies by market. And consumers aren’t necessarily savvy. The UGC has effect even when they’re not savvy. You need to tier your efforts, taking account of the consumers’ Web savviness.

Q: How’s it work in other countries?
A: We haven’t done that research. Happy collaborate…

Q: How does this apply to B2B?
A: More limited.

Anindya Ghose will talk about combining textmining with econometrics. Firms want to know if there’s any economic value to social networks and UGC. How can they monetize UGC?

There’s economic value embedded in the content. E.g., product reviews, geo locations, online purchase behavior. His software mines the text and assesses the economic value of, say, a positive review and even more particular comments. E.g., “good packaging” lowers the value by $0.56 because customers expect superlatives. Particular keywords have particular monetary effects.

Hypothesis: The increasing availability of UGC is reflected in sponsored search metrics. And, yes, he found a correlation between the frequency with which key words are used in blogs and their cost-per-click on search sites. He’s researching whether there’s some sort of causal effect, but it’s not an easy problem. Hence, UGC can be monetized through sponsored search.

[Posted without re-reading. I have to prepare for my unprepared comments. I’m on a panel that’s supposed to be reflecting on the day.] [Tags: ugc ugc3 marketing cluetrain advertising textmining advertising prediction_markets everything_is_miscellaneous ]

Tweet
Follow me

Categories: Uncategorized Tagged with: advertising • business • cluetrain • conference coverage • everythingIsMiscellaneous • expertise • marketing • media • textmining • ugc • ugc3 Date: April 17th, 2009 dw

1 Comment »

[ugc3] Understanding evolving online behavior

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Over-emphasizing small points. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are hereby warned.

John Horrigan of Pew Internet and American Life, gives a “non-Koolaid” presentation. He says that about 12% of Internet users have a blog. The percentage of people doing some form of content sharing is not increasing much at all. The demographics says that 18-24 do the most sharing, and then it goes down in pretty much a straight line. The change over time is not distributed evenly across age groups. Younger adults are turning away from the 6 core UGC behaviors, the 24-35s are increasing. The rest: not much change.

But people are increasingly going to social networkingIf UGC is migrating to rules-based environments, is it a good bargain? On the one hand, good governance can build sustainable mechanisms. OTOH, bad governance is a risk, so you want an open Internet.

Q: A decrease in activity among younger folk? Because they were so heavily involved initially?
John: They’re going to social networking sites instead of maintaining their own sites. But UGC is still an important activity to them.

Q: The changing behaviors as people age and how that will effect UGC?
John: Impossible to answer because we don’t know how the tech will change.


Mainak Mazumdar of The Nieslen Company begins by looking at blogging topics. It’s quite diverse he says. Next: size. Wikipedia has many more topics than Britannica. Also, social networking is very big: Member communities are #4 on most visited lists, after search, portals, and software manufacturers. #5 is email. Social media is big everywhere. (Biggest: 80% of Brazil. 67% in US.) The US is showing comparatively slower growth in “active reach of member communities.” Time spent in CGM has been increasing. So is the time spent on social networking. 35-49 years are the fastest growing audience for social networking sites. Teen consumption of SNS is going down, because they’re going more and more mobile. Mobile will be huge. TV will be big. People are watching more TV. Big media companies are doing well. “Becoming a mother is a dramatic inflectin point and drives women to the Web in search of advice and a desire to connect with others in her shoes” (from the slide).

Is the Net a game-changer for research companies? He compares it to scanner data in the 90s and online surveys in 1990s. In 2000s, perhaps [perhaps??] social networking will once again change the game. Reasons to think the Net is a game-changer overall [i.e., exceptionalism] : Pervasive, sticky, generational.

Q: Is TV watching growing on all screens or just on the living room screen?
Mainak: Time spent watching TV content on a TV.

Q: Maybe SNS have surpassed email because email was used to listserves to serve the social function.
Mainak: We’re talking about how long you spend in Outlook + Web mail. We install monitors that report on how long you spend in each application.

Russ Neuman: Be careful of projecting out from the current tech. It can be disrupted easily.

Q: Older people are entering SNSs. I call them “parents.” To what extent will that change what started out as a youth movement? Is the move to mobile a move out of the SNS as they become mom and dad’s spots? [Oprah is on twitter.]
A: Yes. Some younger teens are going straight to mobile and circumventing the Internet.


Eszter Hargittai talks about the role of skill in Internet use. Yes, young people use digital media and spend a lot of time online, but it’s true that they engage in lots of online activities or that they’re particularly savvy about the Net and Web tools. So, the idea of “digital natives” is often misguided.

She’s particularly interested in the skills people have and need. Her methodology: Paper and pencil surveys to avoid biasing towards those comfortable with using Web tools. 1,060 first year students at U of Illinois. Most of the data comes from 2007, although she has some pre-pub data from 2009. The question is: What explains variation in skill? Gender, education and income predict skill. “The Web offers lots of opportunities but those who can take advantage of them are those who are already privileged.”

This has an effect on how we intervene to equalize matters. You can’t change socio-economic status. And it turns out that motivation doesn’t seem to make much of an effect. You can only be motivated to do something that you already know is a possibility. She shows new data, not ready for blogging, that show that very small percentages of users have actually created content, voted on reviews, edited Wikipedia pages, etc. The number of teenagers who have heard of Twitter is quite low. [Sorry for the lack of numbers. I’m not sure I’m supposed to be reporting even these trends.]

Mainstream media remain strong. Eszter points to the media story about Facebook users having lower grades. Eszter looked at the study and finds it to be of poor quality. Yet it got huge mainstream play. Eszter tweeted about it. She blogged about it. The tweet led to a co-authored paper. Even so, the mainstream probably won’t care, and most of the tweets are still simply retweeting the bad data. The Net is a huge opportunity, but it’s not evenly distributed.

Q: A study found that people online are lonely. It was picked up by the media. The researcher revised to say that it’s the other way around. It wasn’t picked up. The media pick up on the dystopic.

Q: Your data reflects my experience with my students. They don’t blog, they don’t tweet. There’s a class component to this.
Eszter: We measure socio-economic status. Why does it correlate? We’re exploring this. We now ask about parental support of technology use, rules at home about tech use, etc. So far we’re finding (tentatively!) that lower-educated parents tend to have more rules for their kids.

Q: What happens when there’s universal wireline connection?
Eszter: As the tech changes, the skill sets change. The privileged stay ahead, according to my 8 years of studies.

Q: What skills should we be teaching?
A: Complicated. Crucial issue: The evaluation of the credibility of sources. There’s an extreme amount of trust in search engines. That’s one place we need to do more work. And librarians are highly relevant here.

Q: How do people use the Net to learn informally, e.g., WebMD?
Eszter: There are lots of ways and types to do this. But, first you need to know what’s on the Web. You need good search skills, good credibility-evaluation skills.


Cliff Lampe talks about how Mich State U students use Facebook. He presents a study just completed yesterday, so the data isn’t yet perfect. 97% of his sample are FB users (although Cliff expresses some discomfort with this number). Mean average of 441 friends; median = 381. Ninety percent of these they consider to be “actual” friends. 73% only accept friend requests from people they know in real life. Most spend just a little time (under 30mins) at FB per day. About half lets their friends (but not everyone in their network) to see everything in their profile. Almost everyone puts a photo of themselves up. Vast majority have a photo album. About a third think their parents are looking at their page. Overall they think they’re posting for their college and high school friends.

He talks about Everything2.com, a user-generated encyclopedia/compendium that is 11 years old. Why have people exited? Research shows they left because other sites came along that do the same thing better. Also, changes in life circumstances. Also, conflict with administration of the site. There’s a corporitization of some of the UGC sites. He also has looked into why new users don’t stick: They don’t glom onto the norms of the site.

Q: Are reasons for exiting a negative network effect? More than 150 and the network deteriorates?
Cliff: We see that in Usenet. But not so much at Facebook where you’re just dealing with your friends.

Q: Any sites that have tried to drive away new users?
Cliff: Metafilter has a bit of that. Slashdot has a “earn your bullshit” tagline.

Q: Are your students alone or with others when they are online? Are they aware of the technology?
Cliff: The rise of the netbook has had an effect. Most of my students experience social media as a group activity. But a lot of them are not that savvy. They generally don’t know how Wikipedia operates. [Tags: sns social_networking facebook exceptionalism tv wikipedia environment ]

Tweet
Follow me

Categories: blogs Tagged with: blogs • business • conference coverage • digital culture • education • entertainment • environment • everythingIsMiscellaneous • exceptionalism • expertise • facebook • knowledge • libraries • marketing • media • sns • tv • wikipedia Date: April 17th, 2009 dw

2 Comments »

[ugc3] The Evolution of UGC

Dan Hunter of NY Law School begins with an informal talk called “UGC: From Threat He disagrees with Eli Noam that the end game will be commercialization. [Ah, the exceptionalist battle is joined!] He thinks about UGC as amateur media, focusing on the motivation of the users. His question: Is there a role for commercial providers, outside of providing the infrastructure? The content will increasingly be provided by people whose motivations are non-commercial. (He shows Wolf Loves Pork at YouTube.com. Very cool.)

It’s important to not think this is about traditional media forms, he says. It includes virtual worlds, collaborative games. People are living out their lives in these environments. UGC is not something separate from our lives. It is our environment.

Amateur work is crowding out the commercial, he says. E.g., YouTube, music, user reviews at Amazon etc. Most of the money is in the infrastructure, not the content: Blizzard providing World of Warcraft, Google, etc.

Q: Google lost $500M this year on YouTube.
Dan: If you’re suggesting there’s no money in infrastructure…We can’t yet know if that’s a blip, a market indicator, etc.

Q: Two examples that support your case: 1. Orpheus Orchestra has no conductor. 2. YouTube orchestra is collaborative.
Dan: Sites like Wikipedia can be quite bureaucratic. There’s a range of examples, some totally spontaneous.

Q: Wolf Eats Pig actually ends the other way around, which is a bad moral and is very worrisome for Japanese society.


Next, David Card of Forrester Research presents research. [I’m not going to try to capture the numbers.]

Social networking is becoming ubiquitous, but the “creative stuff” is still a minority behavior and is not growing at the same pace as social networking, watching videos, or writing reviews. Budgets for social marketing are still pretty low because the value of it is unproven. [His data actually show that few people can prove profitability from social marketing but a majority think it is valuable]

Social network business models: It will be like air (cf. Charline Li). Or it’s a walled garden. Or it’s a media model. The portal model faces threates from Google and social networking sites. AT SNS’s people view photos and videos, keep up with friends, etc. They’re not consuming much professional content there. Marketers should “tap entertainment media, then build out social marketing promise.” Facebook’s “Beacon” idea was powerful but ineptly handled. [Beacon: When buy something, it asks if you want to share that news with your FB friends.] Money is more likely to come from the audience than from authors; the real social marketing potential is untapped.

Q: Opportunity: Harvesting social networking data for customer relationship management. [Doc Searls: This one’s for you! :)]
David: Lots of people do this. P&G. Fox. They bring in the audience to get feedback. “If you get them into real product development, that’s a nirvana.” Although you have to be careful that you’re not handing design to a niche market of your most enthusiastic customers.
Q: Keeping track of the metadata about the types of info makes this huge market of info usable.
David: Do you mean Amazon ought to make its customer available to others?
Q: No.

Q: The virtual is piercing the physical, ending up in offline retail.
A: Interesting.

Q: What guidance for employees active in these spaces, so they feel free to express their ideas but also potentially censorship?
David: Forrester analysts have personal blogs as well as company blogs. Neither are reviewed. We have policies that say you should think about what you’re saying. But if it’s too heavy handed so that employees look like shills, they won’t get a very big audience. You have to play by the rules of the medium — uncensored, rapid response (e.g., WholeFoods responds instantly, even if it’s an intern in a closet sometwhere) — authenticity, etc. It’s a delicate line.


Robert Cohen talks about business adoption of virtual worlds. He points to the broad use of interactive sites by children 7-12, suggesting that we’re seeing a deep change. There are over 100M subscribers to the Barbi site and 100s of millions of Habbo users. This may portend a generational change.

He points to three waves: Content-centric, Surface [he’s using a Microsoft chart], and immersive. He’s interviewed 50 vendors about how virtual worlds will be used. It has the potential to affect the way business operates (he says). First, it enhances training and teamwork. Then, more interactive corporations. Over the next tend years we’ll see collaborative corporations (among suppliers and product developers) and “modern guild system firms” (“highly technologically competent firms that come together to collaborate on projects”). He points to oil companies using virtual worlds to model environments for training, exploration.

Q: The press is reporting that SecondLife has stumbled in growth and development. And how can we get from Barbi style product focus to a platform approach?
Bob: There’s controversy about this. BTW, Mitch Kapor is working on putting your photo on your avatar and making the movement more realistic. SecondLife also has bought a company that does business operations. But IBM has shown a way to connect virtual worlds through a firewall. But SecondLife is trying. There’s a lot going on i n Europe.

[Posting without rereading so I can go to the break. Sorry.] [Tags: ugc ugc3.0 user_generated_content exceptionalism virtual_worlds second_life social_networks everything_is_miscellaneous ]

Tweet
Follow me

Categories: Uncategorized Tagged with: business • conference coverage • digital culture • everythingIsMiscellaneous • exceptionalism • expertise • marketing • media • ugc Date: April 17th, 2009 dw

1 Comment »

April 15, 2009

The future of the book

I just came from a discussion of the future of the book at Harvard, although it was actually more like the propedeutic for that discussion. Quite fascinating though.

First spoke Ann Blair, a history professor at Harvard, who has a book on the history of information overload (particularly in the early modern period) coming out in the fall of 2010. She talked about how printed books were first received: Positively, the printing press was appreciated for the labor it saved (an early estimate said that four men in one day could create as many books with a printing press as ten could do with a quill in a year) and for driving down the cost of books. (You had to count on selling 300 copies before you’d break even, whereas hand-copied manuscripts were done on commission.) But people also complained because printed books were often shoddily done, and there were too many of them. Ann cited Pliny saying that here is no book so bad that some good cannot be made of it, balanced by Seneca who urged people to read a few books well. [Classic fox vs. hedgehog matchup.]

She then pointed to 16th-17th century references books, including dictionaries, collections of beautiful and elevating sentences (“florilegia”) [twitter, anyone?] and commonplace books. Printing made it possible to have very big books. One of the commonplaces started with 1.5M words, was revised to include 4.5M and had a sequel with 15M words. (Wikipedia had 511M words the last time Ann checked.) Conventions, therefore, arose for finding your way through all those words. Some techniques were typographical (Aristotle’s text in bigger letters, followed by commentaries, etc.) but indexes became more sophisticated. The 15M commonplace book had over 100 pages of entries on a single word, for example (“bellum,” “war”). The indices sometimes had mutliple levels of indentation.

Ann finished by showing an illustration of a 1689 piece of furniture the size of a closet, designed to organize knowledge. There were 3,000 hooks for headings, with multiple hooks for slips of paper under each heading.

I asked whether the availability of slips of paper encourage the de- and re-structuring of knowledge. Ann answered first by talking about the history of slips of paper — the printing press drove up the demand for paper and thus drove down its cost — and then said that it’s hard to gauge the effect on thought because writers were already collecting miscellanies, such as commonplace books.

Ann also explained that large alphabetical concordances had already been created before slips of paper by assigning a letter to each monk and having him go through the Bible looking for each word that begins with that letter.

Then John Palfrey gave a talk about how the world and books look to those born into the digital age. To these digital natives, said John, the world doesn’t divide into online and offline; it’s all converged. They assume digital access. (YouTube is the #2 search engine, JP said.) They expect to be co-creators. They also give away too much information and need to learn to do for themselves the gatekeeping that used to be done for them. The opportunities are huge, JP said, for creativity, reuse, and making knowledge together. JP expects libraries will continue to become social spaces where we learn and explore together, and he expects physical books to persevere because they are so well engineered for knowledge and extended argument. [Personally, I’m not convinced of that. I think books may turn out to be an accident of paper. Check back in 30 years to see who’s right.]

A fascinating afternoon. I wish it had gone on longer.

During the Q&A, Robert Darnton, Harvard’s head librarian, responded to a criticism of the new tools for navigating the university’s collection, by saying that it was still “in beta.” It’s open to all to suggest improvements, many of which have already been incorporated. For me, hearing Harvard’s chief librarian talk about a catalog being “in beta” says it all. (Darnton also talked about Harvard’s position on the Google Books settlement, about which he has been a prominent and eloquent critic.) [Tags: books history_of_books ann_blair john_palfrey robert_darnton libraries knowledge everything_is_miscellaneous google_books ]

Tweet
Follow me

Categories: Uncategorized Tagged with: books • culture • digital culture • everythingIsMiscellaneous • expertise • knowledge • libraries Date: April 15th, 2009 dw

5 Comments »

« Previous Page | Next Page »


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
TL;DR: Share this post freely, but attribute it to me (name (David Weinberger) and link to it), and don't use it commercially without my permission.

Joho the Blog uses WordPress blogging software.
Thank you, WordPress!