Joho the BlogMeaning-mining Wikipedia - Joho the Blog

Meaning-mining Wikipedia

DBpedia extracts information from Wikipedia, building a database that you can query. This isn’t easy because much of the information in Wikipedia is unstructured. On the other hand, there’s an awful lot that’s structured enough so that an algorithm can reliably deduce the semantic content from the language and the layout. For example, the boxed info on bio pages is pretty standardized, so your algorithm can usually assume that the text that follows “Born: ” is a date and not a place name. As the DBpedia site says:

The DBpedia knowledge base currently describes more than 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples). It features labels and short abstracts for these things in 30 different languages; 609,000 links to images and 3,150,000 links to external web pages; 4,878,100 external links into other RDF datasets, 415,000 Wikipedia categories, and 75,000 YAGO categories.

Over time, the site will get better and better at extracting info from Wikipedia. And as it does so, it’s building a generalized corpus of query-able knowledge.

As of now, the means of querying the knowledge requires some familiarity with building database queries. But, the world has accumulated lots of facility with putting front-ends onto databases. DBpedia is working on something differentL accumulating an encyclopedic database, open to all and expressed in the open language of the Semantic Web.

(Via Mirek Sopek.) [Tags: ]

5 Responses to “Meaning-mining Wikipedia”

  1. Freebase has a much better query language, imho.

  2. Maybe, I was only never sure what exactly “harvesting” mean in Freebase method. dBPedia is, at this point – much more open and transparent with its RDFizing :-) approach.

    You may be right that MQL of Freebase may be easier to use than SPARQL. However all signs of the sky show, that SPARQL will be THE query language of forthcoming Web full of Meaning as I call Semantic Web, or in “Meaning Mining” as David calls it.

    BTW, I did not see any comprehensive comparison of Freebase, dbPedia, Yago, Semantic Wiki, OntoWiki and when it comes to querying – Powerset.

    Did you?

  3. There’s a Google Labs (Google Squared – project that’s trying to provide relatively similar functionality although it pulls from a broader set of sources than just Wikipedia. Wikipedia, however, often seems to be the main source.

  4. […] Meaning-mining Wikipedia ( […]

  5. Los muebles laqueados se los repasa con un paño húmedo y con abrillantador para sacar la marcas de dedos. Lo muebles con terminaciones doradas solo se pueden limpiar con el plumero, son muy delicados.

Web Joho only

Comments (RSS).  RSS icon