Joho the Blog » Beginner-to-beginner: Parsing OPML, including level (= indents)

Beginner-to-beginner: Parsing OPML, including level (= indents)

I am 99.9% confident that I’ve spent many hours trying to do in a complex way something that javascript (perhaps with jquery) lets us do with a single line of compressed code. But, I couldn’t find that line. Besides, I’m a hobbyist which means the puzzle is usually more fun than the solution. But this one drove me crazy.

I’m trying to read an OPML file. OPML is an instance of XML designed to record outlines. I found plenty of scripts online (javascript and php) that would read the contents, but none that would tell me the indentation level of each line. I’m sure there’s some simple way of figuring that out, but it beat the heck out of me. So, I finally found some javascript by Vivin Pallath to which I could add a couple of lines (mainly by trial and error) to record indentation info.

Vivin’s script is, appropriately recursive. That’s because outlines are arbitrarily nested, so you have to be able to call a “get the children” function on each line, each of which may have its own children. A recursive function calls itself, in this case until it reaches the end of a branch, at which point it pops up to the next sibling branch and carries on. You can easily record each branch in a flat list, but I also need to know how far down the limb we are. I find recursion to be extraordinarily difficult for my brain to follow, so more or less by trial and error I added a counter (a global variable, although I’m not sure that’s required), and then tried to figure out when it needs to be reset to zero.

The way I have it set up, the tree walking function looks for XML elements called “outline,” because that’s the tag OPML uses. It records the text content of the line in the outline as the attribute “text.” (OPML also can encode stuff other than text, but I happen not to care about that.) At each line, it records the text and the level number in a global array. This is a rather silly way of recording the info, but that’s the easy part to fix.

The parseOpml function consists mainly of code from Phpied that opens an xml file and reads it into an xml document object that javascript knows how to work with. The parseOPML function calls the walkTree function that does the actual parsing. The walkTree function takes as parameters the element from which it should start walking (which, in this case, is the root of the xml object itself) and the starting level of indentation (0). It wouldn’t be hard to add to it the name of the element you want it to pay attention to (which, in this case, is specified in walkTree to be “outline”>

Now for the caveats: I’ve barely tested this. It may break under predictable circumstances. It is very likely wildly inefficient (which doesn’t matter for my petty uses). It will undoubtedly embarrass actual developers. But, maybe it will help some other hobbyist…

// globals
var treeArray = new Array(); // array to hold results
var glevel; // the indent level

function parseOpml(){
  // opens the xml file and calls the tree walker
  

var fname="/pathname/to/your/file.xml";

// open the xml file (from http://www.phpied.com/opml-to-html-via-javascript/) var xmlDoc = null; if (window.ActiveXObject) {// code for IE xmlDoc = new ActiveXObject("Microsoft.XMLDOM"); } else // code for Mozilla, Firefox, Opera, etc. if (document.implementation.createDocument) { xmlDoc = document.implementation.createDocument("", "", null); } else { alert('Your browser cannot handle this script'); } if (xmlDoc != null) { xmlDoc.async = false; xmlDoc.load(fname); } // call the tree walker treeArray.length=0; // reset global array walkTree(xmlDoc,0);

} //--------------- Recursive tree walker --------------

function walkTree(node, glevel){ // walks tree looking for elements tagged "outline" // Replace the two instances of "outline" below to have it // look for other tags //Original function from: Vivin Pallath. Thanks! //http://vivin.net/2005/07/01/innerhtml-alternative-for-xhtml-documents-in-firefox/ //Modified to count indentation levels var n="d"; var a="";

if (node.hasChildNodes()) { node = node.firstChild; do{ // only return elements tagged "outline n = node.nodeName; if (n == "outline") { glevel++; // increase level // pushes the text and level as a string into a global array a = node.getAttribute("text"); // get the text treeArray.push(glevel + "=" + a); } walkTree(node,glevel); // recurse until no children node = node.nextSibling; // pop up to get next sibling // If this branch has no more children, reduce the level if ((n=="outline") && (!node.hasChildNodes())) { glevel = glevel - 1; } } while(node); } }

One Response to “Beginner-to-beginner: Parsing OPML, including level (= indents)”

  1. Ran into this from google. Thanks for the mention and glad I could help :)

Leave a Reply


Web Joho only

Comments (RSS).  RSS icon