Using the Perma.cc API to check links

Posted on:: November 25th, 2018

My new book (Everyday Chaos, HBR Press, May 2019) has a few hundred footnotes with links to online sources. Because Web sites change and links rot, I decided to link to Perma.cc‘s pages instead . Perma.cc is a product of the Harvard Library Innovation Lab, which I used to co-direct with Kim Dulin, but Perma is a Jonathan Zittrain project from after I left.

When you give Perma.cc a link to a page on the Web, it comes back with a link to a page on the Perma.cc site. That page has an archive copy of the original page exactly as it was when you supplied the link. It also makes a screen capture of that original page. And of course it includes a link to the original. It also promises to maintain the Perma.cc copy and screen capture in perpetuity — a promise backed by the Harvard Law Library and dozens of other libraries. So, when you give a reader a Perma link, they are taken to the Perma.cc page where they’ll always find the archived copy and the screen capture, no matter what happens to the original site. Also, the service is free for everyone, for real. Plus, the site doesn’t require users to supply any information about themselves. Also, there are no ads.

So that’s why my book’s references are to Perma.cc.

But, over the course of the six years I spent writing this book, my references suffered some link rot on my side. Before I got around to creating the Perma links, I managed to make all the obvious errors and some not so obvious. As a result, now that I’m at the copyediting stage, I wanted to check all the Perma links.

I had already compiled a bibliography as a spreadsheet. (The book will point to the Perma.cc page for that spreadsheet.) So, I selected the Title and Perma Link columns, copied the content, and stuck it into a text document. Each line contains the page’s headline and then the Perma link.

Perma.cc has an API that made it simple to write a script that looks up each Perma link and prints out the title it’s recorded next to the title of the page that I intend to be linked. If there’s a problem with Perma link, such as a double “https://https://” (a mistake I managed to introduce about a dozen times), or if the Perma link is private and not accessible to the public, it notes the problem. The human brain is good at scanning this sort of info, looking for inconsistencies.

Here’s the script. I used PHP because I happen to know it better than a less embarrassing choice such as Python and because I have no shame.

1	<?php

2	// This is a basic program for checking a list of page titles and perma.cc links
3	// It’s done badly because I am a terrible hobbyist programmer.
4	// I offer it under whatever open source license is most permissive. I’m really not
5	// going to care about anything you do with it. Except please note I’m a
6	// terrible hobbyist programmer who makes no claims about how well this works.
7	//
8	// David Weinberger
9	// [email protected]
10	// Nov. 23, 2018

11	// Perma.cc API documentation is here: https://perma.cc/docs/developer

12	// This program assumes there’s a file with the page title and one perma link per line.
13	// E.g. The Rand Corporation: The Think Tank That Controls America https://perma.cc/B5LR-88CF

14	// Read that text file into an array
15	$lines = file(‘links-and-titles.txt’);


16	for ($i = 0; $i < count($lines); $i++){
17	$line = $lines[$i];
18	// divide into title and permalink
19	$p1 = strpos($line, “https”); // find the beginning of the perma link
20	$fullperma = substr($line, $p1); // get the full perma link
21	$origtitle = substr($line, 0,$p1); // get the title
22	$origtitle = rtrim($origtitle); // trim the spaces from the end of the title

23	// get the distinctive part of the perma link: the stuff after https://perma.cc/
24	$permacode = strrchr($fullperma,”/”); // find the last forward slash
25	$permacode = substr($permacode,1,strlen($permacode)); // get what’s after that slash
26	$permacode = rtrim($permacode); // trim any spaces from the end

27	// create the url that will fetch this perma link
28	$apiurl = “https://api.perma.cc/v1/public/archives/” . $permacode . “/”;

29	// fetch the data about this perma link
30	$onelink = file_get_contents($apiurl);
31	// echo $onelink; // this would print the full json
32	// decode the json
33	$j = json_decode($onelink, true);
34	// Did you get any json, or just null?
35	if ($j == null){
36	// hmm. This might be a private perma link. Or some other error
37	echo “<p>– $permacode failed. Private? $permaccode</p>”;
38	}
39	// otherwise, you got something, so write some of the data into the page
40	else {
41	echo “<b>” . $j[“guid”] . ‘</b><blockquote>’ . $j[“title”] . ‘<br>’ . $origtitle . “<br>” . $j[“url”] . “</blockquote>”;
42	}
43	}


44	// finish by noting how many files have been read
45	echo “<h2>Read ” . count($lines) . “</h2>”;

46	?>

Run this script in a browser and it will create a page with the results. (The script is available at GitHub.)

Thanks, Perma.cc!

By the way, and mainly because I keep losing track of this info, the table of code was created by a little service cleverly called Convert JS to Table.

Follow me

Categories: free-making software, humor, tech dw

Using the Perma.cc API to check links

Share this:

Leave a Reply