I had both CNN and Twitter on yesterday all afternoon, looking for news about the Boston Marathon bombings. I have not done a rigorous analysis (nor will I, nor have I ever), but it felt to me that Twitter put forward more and more varied claims about the situation, and reacted faster to misstatements. CNN plodded along, but didn’t feel more reliable overall. This seems predictable given the unfiltered (or post-filtered) nature of Twitter.
But Twitter also ran into some scaling problems for me yesterday. I follow about 500 people on Twitter, which gives my stream a pace and variety that I find helpful on a normal day. But yesterday afternoon, the stream roared by, and approached filter failure. A couple of changes would help:
First, let us sort by most retweeted. When I’m in my “home stream,” let me choose a frequency of tweets so that the scrolling doesn’t become unwatchable; use the frequency to determine the threshold for the number of retweets required. (Alternatively: simply highlight highly re-tweeted tweets.)
Second, let us mute based on hashtag or by user. Some Twitter cascades I just don’t care about. For example, I don’t want to hear play-by-plays of the World Series, and I know that many of the people who follow me get seriously annoyed when I suddenly am tweeting twice a minute during a presidential debate. So let us temporarily suppress tweet streams we don’t care about.
It is a lesson of the Web that as services scale up, they need to provide more and more ways of filtering. Twitter had “follow” as an initial filter, and users then came up with hashtags as a second filter. It’s time for a new round as Twitter becomes an essential part of our news ecosystem.
Steve Coll has a good piece in the New Yorker about the importance of Al Qaeda as a brand:
…as long as there are bands of violent Islamic radicals anywhere in the world who find it attractive to call themselves Al Qaeda, a formal state of war may exist between Al Qaeda and America. The Hundred Years War could seem a brief skirmish in comparison.
This is a different category of issue than the oft-criticized “war on terror,” which is a war against a tactic, not against an enemy. The war against Al Qaeda implies that there is a structurally unified enemy organization. How do you declare victory against a group that refuses to enforce its trademark?
In this, the war against Al Qaeda (which is quite preferable to a war against terror — and I think Steve agrees) is similar to the war on cancer. Cancer is not a single disease and the various things we call cancer are unlikely to have a single cause and thus are unlikely to have a single cure (or so I have been told). While this line of thinking would seem to reinforce politicians’ referring to terrorism as a “cancer,” the same applies to dessert. Each of these terms probably does have a single identifying characteristic, which means they are not classic examples of Wittgensteinian family resemblances: all terrorism involves a non-state attack that aims at terrifying the civilian population, all cancers involve “unregulated cell growth” [thank you Wikipedia!], and all desserts are designed primarily for taste not nutrition and are intended to end a meal. In fact, the war on Al Qaeda is actually more like the war on dessert than like the war on cancer, because just as there will always be some terrorist group that takes up the Al Qaeda name, there will always be some boundary-pushing chef who declares that beefy jerky or glazed ham cubes are the new dessert. You can’t defeat an enemy that can just rebrand itself.
I think that Steve Coll comes to the wrong conclusion, however. He ends his piece this way:
Yet the empirical case for a worldwide state of war against a corporeal thing called Al Qaeda looks increasingly threadbare. A war against a name is a war in name only.
I agree with the first sentence, but I draw two different conclusions. First, this has little bearing on how we actually respond to terrorism. The thinking that has us attacking terrorist groups (and at times their family gatherings) around the world is not made threadbare by the misnomer “war against Al Qaeda.” Second, isn’t it empirically obvious that a war against a name is not a war in name only?
A New Yorker article that profiles John Quijada, the inventor of a language (and a double-dotter!), mentions the first artificial language we know about, Lingua Ignota. The article’s author, Joshua Foer, tells us it was invented by Hildegard von Bingen (totally fun to say out loud) in the 12th century. “All that remains of her language is a short passage and a dictionary of a thousand and twelve words listed in hierarchical order, from the most important (Aigonz, God) to the least (Cauiz, cricket).” There’s more about Lingua Ignota over at our friend, Wikipedia. (And did you remember to kick in a few bucks to keep Wikipedia in booze and cigarettes?)
Ordering a list by cosmic importance (remember the Great Chain of Being?) makes sense if everyone agrees on what that order is. And it expresses respect for the order. That’s why some clergyfolk objected to the fact that Diderot’s Encyclopedia in the 18th century alphabetized its contents. Imagine Cows coming before God!
Before we sneer, we should keep in mind that we do the same thing when we make lists to be seen by others. For example, lists of donors put the Big Money folk first. For another example, we wouldn’t post a list of New Year’s resolutions in the following order:
My New Year’s Resolutions
Bring in an apple instead of snacking from the vending machine
Don’t let the ironing back up for more than a week
Refill the bird-feeder before it’s empty.
Get those birthday cards in the mail on time!
And there are rhetorical rules for the order in which we give reasons to support an argument. For example, we often give the easiest reason to accept first, and lead up to the most serious reason: “It’s easy, it’ll save money, people will feel good about it, and it’s the right thing to do.” The phrase “most important,….” is not permitted to appear in the middle of a sentence.
Order is content.
There’s a knowingly ridiculous thread at Reddit at the moment: Which world leader would win if pitted against other leaders in a fight to the death.
The title is a straightline begging for punchlines. And it is a funny thread. Yet, I found it shockingly informative. The shock comes from realizing just how poorly informed I am.
My first reaction to the title was “Putin, duh!” That just shows you what I know. From the thread I learned that Joseph Kabila (Congo) and Boyko Borisov (Bulgaria) would kick Putin’s ass. Not to mention that Jigme Khesar Namgyel Wangchuck (Bhutan), who would win on good looks.
Now, when I say that this thread is “shockingly informative,” I don’t mean that it gives sufficient or even relevant information about the leaders it discusses. After all, it focuses on their personal combat skills. Rather, it is an interesting example of the haphazard way information spreads when that spreading is participatory. So, we are unlikely to have sent around the Wikipedia article on Kabila or Borisov simply because we all should know about the people leading the nations of the world. Further, while there is more information about world leaders available than ever in human history, it is distributed across a huge mass of content from which we are free to pick and choose. That’s disappointing at the least and disastrous at its worst.
On the other hand, information is now passed around if it is made interesting, sometimes in jokey, demeaning ways, like an article that steers us toward beefcake (although the president of Ireland does make it up quite high in the Reddit thread). The information that gets propagated through this system is thus spotty and incomplete. It only becomes an occasion for serendipity if it is interesting, not simply because it’s worthwhile. But even jokey, demeaning posts can and should have links for those whose interest is piqued.
So, two unspectacular conclusions.
First, in our despair over the diminishing of a shared knowledge-base of important information, we should not ignore the off-kilter ways in which some worthwhile information does actually propagate through the system. Indeed, it is a system designed to propagate that which is off-kilter enough to be interesting. Not all of that “news,” however, is about water-skiing cats. Just most.
Second, we need to continue to have the discussion about whether there is in fact a shared news/knowledge-base that can be gathered and disseminated, whether there ever was, whether our populations ever actually came close to living up to that ideal, the price we paid for having a canon of news and knowledge, and whether the networking of knowledge opens up any positive possibilities for dealing with news and knowledge at scale. For example, perhaps a network is well-informed if it has experts on hand who can explain events at depth (and in interesting ways) on demand, rather than assuming that everyone has to be a little bit expert at everything.
I’m not sure how I came into possession of a copy of The Indexer, a publication by the Society of Indexers, but I thoroughly enjoyed it despite not being a professional indexer. Or, more exactly, because I’m not a professional indexer. It brings me joy to watch experts operate at levels far above me.
The issue of The Indexer I happen to have — Vol. 30, No,. 1, March 2012 — focuses on digital trends, with several articles on the Semantic Web and XML-based indexes as well as several on broad trends in digital reading and digital books, and on graphical visualizations of digital indexes. All good.
I also enjoyed a recurring feature: Indexes reviewed. This aggregates snippets of book reviews that mention the quality of the indexes. Among the positive reviews, the Sunday Telegraph thinks that for the book My Dear Hugh, “the indexer had a better understanding of the book than the editor himself.” That’s certainly going on someone’s resumé!
I’m not sure why I enjoy works of expertise in fields I know little about. It’s true that I know a little about indexing because I’ve written about the organization of digital information, and even a little about indexing. And I have a lot of interest in the questions about the future of digital books that happen to be discussed in this particular issue of The Indexer. That enables me to make more sense of the journal than might otherwise be the case. But even so, what I enjoy most are the discussions of topics that exhibit the professionals’ deep involvement in their craft.
But I think what I enjoy most of all is the discovery that something as seemingly simple as generating an index turns out to be indefinitely deep. There are endless technical issues, but also fathomless questions of principle. There’s even indexer humor. For example, one of the index reviews notes that Craig Brown’s The Lost Diaries “gives references with deadpan precision (‘Greer, Germaine: condemns Queen, 13-14…condemns pineapple, 70…condemns fat, thin and medium sized women, 93…condemns kangaroos,122′).”
As I’ve said before, everything is interesting if observed at the right level of detail.
Paul Deschner and I had a fascinating conversation yesterday with Jeffrey Wallman, head of the Tibetan Buddhist Resource Center about perhaps getting his group’s metadata to interoperate with the library metadata we’ve been gathering. The TBRC has a fantastic collection of Tibetan books. So we were talking about the schemas we use — a schema being the set of slots you create for the data you capture. For example, if you’re gathering information about books, you’d have a schema that has slots for title, author, date, publisher, etc. Depending on your needs, you might also include slots for whether there are color illustrations, is the original cover still on it, and has anyone underlined any passages. It turns out that the Tibetan concept of a book is quite a bit different than the West’s, which raises interesting questions about how to capture and express that data in ways that can be useful mashed up.
But it was when we moved on to talking about our author schemas that Jeffrey listed one type of metadata that I would never, ever have thought to include in a schema: reincarnation. It is important for Tibetans to know that Author A is a reincarnation of Author B. And I can see why that would be a crucial bit of information.
So, let this be a lesson: attempts to anticipate all metadata needs are destined to be surprised, sometimes delightfully.
The American Psychiatric Association has approved its new manual of diagnoses — Diagnostic and Statistical Manual of Mental Disorders — after five years of controversy [nytimes].
For example, it has removed Aspberger’s as a diagnosis, lumping it in with autism, but it has split out hoarding from the more general category of obsessive-compulsive disorder. Lumping and splitting are the two most basic activities of cataloguers and indexers. There are theoretical and practical reasons for sometimes lumping things together and sometimes splitting them, but they also characterize personalities. Some of us are lumpers, and some of us are splitters. And all of us are a bit of each at various times.
The DSM runs into the problems faced by all attempts to classify a field. Attempts to come up with a single classification for a complex domain try to impose an impossible order:
First, there is rarely (ever?) universal agreement about how to divvy up a domain. There are genuine disagreements about which principles of organization ought to be used, and how they apply. Then there are the Lumper vs. the Splitter personalities.
Second, there are political and economic motivations for dividing up the world in particular ways.
Third, taxonomies are tools. There is no one right way to divide up the world, just as there is no one way to cut a piece of plywood and no one right thing to say about the world. It depends what you’re trying to do. DSM has conflicting purposes. For one thing, it affects treatment. For example, the NY Times article notes that the change in the classification of bipolar disease “could ‘medicalize’ frequent temper tantrums,” and during the many years in which the DSM classified homosexuality as a syndrome, therapists were encouraged to treat it as a disease. But that’s not all the DSM is for. It also guides insurance payments, and it affects research.
Given this, do we need the DSM? Maybe for insurance purposes. But not as a statement of where nature’s joints are. In fact, it’s not clear to me that we even need it as a single source to define terms for common reference. After all, biologists don’t agree about how to classify species, but that science seems to be doing just fine. The Encyclopedia of Life takes a really useful approach: each species gets a page, but the site provides multiple taxonomies so that biologists don’t have to agree on how to lump and split all the forms of life on the planet.
If we do need a single diagnostic taxonomy, DSM is making progress in its methodology. It has more publicly entered the fray of argument, it has tried to respond to current thinking, and it is now going to be updated continuously, rather than every 5 years. All to the good.
But the rest of its problems are intrinsic to its very existence. We may need it for some purposes, but it is never going to be fully right…because tools are useful, not true.
David Wood of 3RoundStones.com is talking about Callimachus, an open source project that is also available through his company. [NOTE: Liveblogging. All bets on accuracy are off.]
We’re moving from PCs to mobile, he says. This is rapidly changing the Internet. 51% of Internet traffic is non-human, he says (as of Feb 2012). 35hrs of video are uploaded to YouTube every minute. Traditionally we dealt with this type of demand via data warehousing: put it all in one place for easy access. But that’s not true: we never really got it all in one place accessible through one interface. Jeffrey Pollock says we should be talking not about data integration but interoperability because the latter implies a looser coupling.
He gives some use cases:
BBC wanted to have a Web presence for all of its 1500 broadcasts per day. They couldn’t do it manually. So, they decided to grab data from the linked open data data cloud and assemble the pages automatically. They hired fulltime editors to curate Wikipedia. RDF enabled them to assuemble the pages.
O’Reilly Media switched to RDF reluctantly but for purely pragmatic reasons.
BestBuy, too. They used RDFa to embed metadata into their pages to improve their SEO.
Elsevier uses Linked Data to manage their assets, from acquisition to delivery.
This is not science fiction, he says. It’s happening now.
Then two negative examples:
David says that Microsoft adopted RDF in the late 90s. But Netscape came out a portal tech based on RDF that scared Microsoft out of the standards effort. But they needed the tech, so they’ve reinvented it three times in proprietary ways.
Borders was too late in changing its tech.
Then he does a product pitch for Callimachus Enterperise: a content management system for enterprises.
I’m at the Semantic Technology & Business conference in NYC. Matthew Degel, Senior Vice President and Chief Architect at Viacom Media Networks is talking about “Modeling Media and the Content Supply Chain Using Semantic Technologies.” [NOTE: Liveblogging. Getting things wrong. Mangling words. Missing points. Over- and under-emphasizing the wrong things. Not running a spellpchecker. You are warned!]
Matthew says that the problem is that we’re “drowning in data but starved for information” Tere is a “thirst for asset-centric views.” And of course, Viacom needs to “more deeply integrate how property rights attach to assets.” And everything has to be natively local, all around the world.
Viacom has to model the content supply chain in a holistic way. So, how to structure the data? To answer, they need to know what the questions are. Data always has some structure. The question is how volatile those structures are. [I missed about 5 mins m-- had to duck out.]
He shows an asset tree, “relating things that are different yet the same,” with SpongeBob as his example: TV series, characters, the talent, the movie, consumer products, etc. Stations are not allowed to air a commercial with the voice actor behind Spoongey, Tom Kenney, during the showing of the SpongeBob show, so they need to intersect those datasets. Likewise, the video clip you see on your setup box’s guide is separate from, but related to, the original. For doing all this, Viacom is relying on inferences: A prime time version of a Jersey Shore episode, which has had the bad language censored out of it, is a version of the full episode, which is part of the series which has licensing contracts within various geographies, etc. From this Viacom can infer that the censored episode is shown in some geography under some licensing agreements, etc.
“We’ve tried to take a realistic approach to this.” As excited as they are about the promise, “we haven’t dived in with a huge amount of resources.” They’re solving immediate problems. They began by making diagrams of all of the apps and technologies. It was a mess. So, they extracted and encoded into a triplestore all the info in the diagram. Then they overlaid the DR data. [I don't know what DR stands for. I'm guessing the D stands for Digital, and the R might be Resource]] Further mapping showed that some apps that they weren’t paying much attention to were actually critical to multiple systems. They did an ontology graph as a London Underground map. [By the way, Gombrich has a wonderful history and appreciation of those maps in Art and Representation, I believe.]
What’s worked? They’re focusing on where they’re going, not where they’ve been. This has let them “jettison a lot of intellectual baggage” so that they can model business processes “in a much cleaner and effective way.” Also, OWL has provided a rich modeling language for expressing their Enterprise Information Model.
What hasn’t worked?
“The toolsets really aren’t quite there yet.” He says that based on the conversations he’s had to today, he doesn’t think anyone disagrees with him.
Also, the modeling tools presume you already know the technology and the approach. Also, the query tools presume you have a user at a keyboard rather than as a backend of a Web service capable of handling sufficient volume. For example, he’d like “Crystal Reports for SPARQL,” as an example of a usable tool.
Visualization tools are focused on interactive use. You pick a class and see the relationships, etc. But if you want to see a traditional ERD diagram, you can’t.
- Also, the modeling tools present a “forward-bias.” E.g., there are tools for turning schemas into ontologies, but not for turning ontologies into a reference model for schema.
Matthew makes some predictions:
They will develop into robust tools
Semantic tech will enable queries such as “Show me all Madonna interviews where she sings, where the footage has not been previously shown, and where we have the license to distribute it on the Web in Australia in Dec.”
I’m at the “Symposium on Digital Curation in the Era of Big Data” held by the Board on Research Data and Information of the National Research Council. These liveblog notes cover (in some sense — I missed some folks, and have done my usual spotty job on the rest) the morning session. (I’m keynoting in the middle of it.)
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
Alan Blatecky [pdf] from the National Science Foundation says science is being transformed by Big Data. [I can't see his slides from the panel at front.] He points to the increase in the volume of data, but we haven’t paid enough attention to the longevity of the data. And, he says, some data is centralized (LHC) and some is distributed (genomics). And, our networks are unable to transport large amounts of data [see my post], making where the data is located quite significant. NSF is looking at creating data infrastructures. “Not one big cloud in the sky,” he says. Access, storage, services — how do we make that happen and keep it leading edge? We also need a “suite of policies” suitable for this new environment.
He closes by talking about the Data Web Forum, a new initiative to look at a “top-down governance approach.” He points positively to the IETF’s “rough consensus and running code.” “How do we start doing that in the data world?” How do we get a balanced representation of the community? This is not a regulatory group; everything will be open source, and progress will be through rough consensus. They’ve got some funding from gov’t groups around the world. (Check CNI.org for more info.)
Now Josh Greenberg from the Sloan Foundation. He points to the opportunities presented by aggregated Big Data: the effects on social science, on libraries, etc. But the tools aren’t keeping up with the computational power, so researchers are spending too much time mastering tools, plus it can make reproducibility and provenance trails difficult. Sloan is funding some technical approaches to increasing the trustworthiness of data, including in publishing. But Sloan knows that this is not purely a technical problem. Everyone is talking about data science. Data scientist defined: Someone who knows more about stats than most computer scientists, and can write better code than typical statisticians :) But data science needs to better understand stewardship and curation. What should the workforce look like so that the data-based research holds up over time? The same concerns apply to business decisions based on data analytics. The norms that have served librarians and archivists of physical collections now apply to the world of data. We should be looking at these issues across the boundaries of academics, science, and business. E.g., economics works now rests on data from Web businesses, US Census, etc.
[I couldn't liveblog the next two — Michael and Myron — because I had to leave my computer on the podium. The following are poor summaries.]
Michael Stebbins, Assistant Director for Biotechnology in the Office of Science and Technology Policy in the White House, talked about the Administration’s enthusiasm for Big Data and open access. It’s great to see this degree of enthusiasm coming directly from the White House, especially since Michael is a scientist and has worked for mainstream science publishers.
Myron Gutmann, Ass’t Dir of of the National Science Foundation likewise expressed commitment to open access, and said that there would be an announcement in Spring 2013 that in some ways will respond to the recent UK and EC policies requiring the open publishing of publicly funded research.
After the break, there’s a panel.
Anne Kenney, Dir. of Cornell U. Library, talks about the new emphasis on digital curation and preservation. She traces this back at Cornell to 2006 when an E-Science task force was established. She thinks we now need to focus on e-research, not just e-science. She points to Walters and Skinners “New Roles for New Times: Digital Curation for Preservation.” When it comes to e-research, Anne points to the need for metadata stabilization, harmonizing applications, and collaboration in virtual communities. Within the humanities, she sees more focus on curation, the effect of the teaching environment, and more of a focus on scholarly products (as opposed to the focus on scholarly process, as in the scientific environment).
She points to Youngseek Kim et al. “Education for eScience Professionals“: digital curators need not just subject domain expertise but also project management and data expertise. [There's lots of info on her slides, which I cannot begin to capture.] The report suggests an increasing focus on people-focused skills: project management, bringing communities together.
She very briefly talks about Mary Auckland’s “Re-Skilling for Research” and Williford and Henry, “One Culture: Computationally Intensive Research in the Humanities and Sciences.”
So, what are research libraries doing with this information? The Association of Research Libraries has a jobs announcements database. And Tito Sierra did a study last year analyzing 2011 job postings. He looked at 444 jobs descriptions. 7.4% of the jobs were “newly created or new to the organization.” New mgt level positions were significantly higher, while subject specialist jobs were under-represented.
Anne went through Tito’s data and found 13.5% have “digital” in the title. There were more digital humanities positions than e-science. She posts a lists of the new titles jobs are being given, and they’re digilicious. 55% of those positions call for a library science degree.
Anne concludes: It’s a growth area, with responsibilities more clearly defined in the sciences. There’s growing interest in serving the digital humanists. “Digital curation” is not common in the qualifications nomenclature. MLS or MLIS is not the only path. There’s a lot of interest in post-doctoral positions.
Margarita Gregg of the National Oceanic and Atmospheric Administration, begins by talking about challenges in the era of Big Data. They produce about 15 petabytes of data per year. It’s not just about Big Data, though. They are very concerned with data quality. They can’t preserve all versions of their datasets, and it’s important to keep track of the provenance of that data.
Margarita directs one of NOAA’s data centers that acquires, preserves, assembles, and provides access to marine data. They cannot preserve everything. They need multi-disciplinary people, and they need to figure out how to translate this data into products that people need. In terms of personnel, they need: Data miners, system architects, developers who can translate proprietary formats into open standards, and IP and Digital Rights Management experts so that credit can be given to the people generating the data. Over the next ten years, she sees computer science and information technology becoming the foundations of curation. There is no currently defined job called “digital curator” and that needs to be addressed.
Vicki Ferrini at the Lamont -Doherty Earth Observatory at Columbia University works on data management, metadata, discovery tools, educational materials, best practice guidelines for optimizing acquisition, and more. She points to the increased communication between data consumers and producers.
As data producers, the goal is scientific discovery: data acquisition, reduction, assembly, visualization, integration, and interpretation. And then you have to document the data (= metadata).
Data consumers: They want data discoverability and access. Inceasingly they are concerned with the metadata.
The goal of data providers is to provide acccess, preservation and reuse. They care about data formats, metadata standards, interoperability, the diverse needs of users. [I've abbreviated all these lists because I can't type fast enough.].
At the intersection of these three domains is the data scientist. She refers to this as the “data stewardship continuum” since it spans all three. A data scientist needs to understand the entire life cycle, have domain experience, and have technical knowledge about data systems. “Metadata is key to all of this.” Skills: communication and organization, understanding the cultural aspects of the user communities, people and project management, and a balance between micro- and macro perspectives.
Challenges: Hard to find the right balance between technical skills and content knowledge. Also, data producers are slow to join the digital era. Also, it’s hard to keep up with the tech.
Andy Maltz, Dir. of Science and Technology Council of Academy of Motion Picture Arts and Sciences. AMPA is about arts and sciences, he says, not about The Business.
The Science and Technology Council was formed in 2005. They have lots of data they preserve. They’re trying to build the pipeline for next-generation movie technologists, but they’re falling behind, so they have an internship program and a curriculum initiative. He recommends we read their study The Digital Dilemma. It says that there’s no digital solution that meets film’s requirement to be archived for 100 years at a low cost. It costs $400/yr to archive a film master vs $11,000 to archive a digital master (as of 2006) because of labor costs. [Did I get that right?] He says collaboration is key.
In January they released The Digital Dilemma 2. It found that independent filmmakers, documentarians, and nonprofit audiovisual archives are loosely coupled, widely dispersed communities. This makes collaboration more difficult. The efforts are also poorly funded, and people often lack technical skills. The report recommends the next gen of digital archivists be digital natives. But the real issue is technology obsolescence. “Technology providers must take archival lifetimes into account.” Also system engineers should be taught to consider this.
He highly recommends the Library of Congress’ “The State of Recorded Sound Preservation in the United States,” which rings an alarm bell. He hopes there will be more doctoral work on these issues.
Among his controversial proposals: Require higher math scores for MLS/MLIS students since they tend to score lower than average on that. Also, he says that the new generation of content creators have no curatorial awareness. Executivies and managers need to know that this is a core business function.
Demand side data points: 400 movies/year at 2PB/movie. CNN has 1.5M archived assets, and generates 2,500 new archive objects/wk. YouTube: 72 hours of video uploaded every minute.
Show business is a business.
Need does not necessarily create demand.
The nonprofit AV archive community is poorly organized.
Next gen needs to be digital natvies with strong math and sci skills.
The next gen of executive leaders needs to understand the importance of this.
Digital curation and long-term archiving need a business case.
Q: How about linking the monetary value of the metadata to the metadata? That would encourage the generation of metadata.
Q: Weinberger paints a picture of flexible world of flowing data, and now we’re back in the academic, scientific world where you want good data that lasts. I’m torn.
A: Margarita: We need to look how that data are being used. Maybe in some circumstances the quality of the data doesn’t matter. But there are other instances where you’re looking for the highest quality data.
A: [audience] In my industry, one person’s outtakes are another person’s director cuts.
A: Anne: In the library world, we say if a little metadata would be great, a lot of it would be great. We need to step away from trying to capture the most to capturing the most useful (since can’t capture the most). And how do you produce data in a way that’s opened up to future users, as well as being useful for its primary consumers? It’s a very interesting balance that needs to be played. Maybe short-term need is a higher thing and long-term is lower.
A: Vicki: The scientists I work with use discrete data sets, spreadsheets, etc. As we get along we’ll have new ways to check the quality of datasets so we can use the messy data as well.
Q: Citizen curation? E.g., a lot of antiques are curated by being put into people’s attics…Not sure what that might imply as model. Two parallel models?
A: Margarita: We’re going to need to engage anyone who’s interested. We need to incorporate citizen corporation.
Anne: That’s already underway where people have particular interests. E.g., Cornell’s Lab of Ornithology where birders contribute heavily.
Q: What one term will bring people info about this topic?
A: Vicki: There isn’t one term, which speaks to the linked data concept.
Q: How will you recruit people from all walks of life to have the skills you want?
A: Andy: We need to convince people way earlier in the educational process that STEM is cool.
A: Anne: We’ll have to rely to some degree on post-hire education.
Q: My shop produces and integrates lots of data. We need people with domain and computer science skills. They’re more likely to come out of the domains.
A: Vicki: As long as you’re willing to take the step across the boundary, it doesn’t mater which side you start from.
Q: 7 yrs ago in library school, I was told that you need to learn a little programming so that you understand it. I didn’t feel like I had to add a whole other profession on to the one I was studying.
« Previous Page | Next Page »