Joho the Blog » Using the Perma.cc API to check links
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

Using the Perma.cc API to check links

My new book (Everyday Chaos, HBR Press, May 2019) has a few hundred footnotes with links to online sources. Because Web sites change and links rot, I decided to link to Perma.cc‘s pages instead . Perma.cc is a product of the Harvard Library Innovation Lab, which I used to co-direct with Kim Dulin, but Perma is a Jonathan Zittrain project from after I left.

When you give Perma.cc a link to a page on the Web, it comes back with a link to a page on the Perma.cc site. That page has an archive copy of the original page exactly as it was when you supplied the link. It also makes a screen capture of that original page. And of course it includes a link to the original. It also promises to maintain the Perma.cc copy and screen capture in perpetuity — a promise backed by the Harvard Law Library and dozens of other libraries. So, when you give a reader a Perma link, they are taken to the Perma.cc page where they’ll always find the archived copy and the screen capture, no matter what happens to the original site. Also, the service is free for everyone, for real. Plus, the site doesn’t require users to supply any information about themselves. Also, there are no ads.

So that’s why my book’s references are to Perma.cc.

But, over the course of the six years I spent writing this book, my references suffered some link rot on my side. Before I got around to creating the Perma links, I managed to make all the obvious errors and some not so obvious. As a result, now that I’m at the copyediting stage, I wanted to check all the Perma links.

I had already compiled a bibliography as a spreadsheet. (The book will point to the Perma.cc page for that spreadsheet.) So, I selected the Title and Perma Link columns, copied the content, and stuck it into a text document. Each line contains the page’s headline and then the Perma link.

Perma.cc has an API that made it simple to write a script that looks up each Perma link and prints out the title it’s recorded next to the title of the page that I intend to be linked. If there’s a problem with Perma link, such as a double “https://https://” (a mistake I managed to introduce about a dozen times), or if the Perma link is private and not accessible to the public, it notes the problem. The human brain is good at scanning this sort of info, looking for inconsistencies.

Here’s the script. I used PHP because I happen to know it better than a less embarrassing choice such as Python and because I have no shame.

1

<?php

 

2

// This is a basic program for checking a list of page titles and perma.cc links

3

// It’s done badly because I am a terrible hobbyist programmer.

4

// I offer it under whatever open source license is most permissive. I’m really not

5

// going to care about anything you do with it. Except please note I’m a

6

// terrible hobbyist programmer who makes no claims about how well this works.

7

//

8

// David Weinberger

9

// [email protected]

10

// Nov. 23, 2018

 

11

// Perma.cc API documentation is here: https://perma.cc/docs/developer

 

12

// This program assumes there’s a file with the page title and one perma link per line.

13

// E.g. The Rand Corporation: The Think Tank That Controls America https://perma.cc/B5LR-88CF

 

14

// Read that text file into an array

15

$lines = file(‘links-and-titles.txt’);

 

 

16

for ($i = 0; $i < count($lines); $i++){

17

$line = $lines[$i];

18

// divide into title and permalink

19

$p1 = strpos($line, “https”); // find the beginning of the perma link

20

$fullperma = substr($line, $p1); // get the full perma link

21

$origtitle = substr($line, 0,$p1); // get the title

22

$origtitle = rtrim($origtitle); // trim the spaces from the end of the title

 

23

// get the distinctive part of the perma link: the stuff after https://perma.cc/

24

$permacode = strrchr($fullperma,”/”); // find the last forward slash

25

$permacode = substr($permacode,1,strlen($permacode)); // get what’s after that slash

26

$permacode = rtrim($permacode); // trim any spaces from the end

 

27

// create the url that will fetch this perma link

28

$apiurl = “https://api.perma.cc/v1/public/archives/” . $permacode . “/”;

 

29

// fetch the data about this perma link

30

$onelink = file_get_contents($apiurl);

31

// echo $onelink; // this would print the full json

32

// decode the json

33

$j = json_decode($onelink, true);

34

// Did you get any json, or just null?

35

if ($j == null){

36

// hmm. This might be a private perma link. Or some other error

37

echo “<p>– $permacode failed. Private? $permaccode</p>”;

38

}

39

// otherwise, you got something, so write some of the data into the page

40

else {

41

echo “<b>” . $j[“guid”] . ‘</b><blockquote>’ . $j[“title”] . ‘<br>’ . $origtitle . “<br>” . $j[“url”] . “</blockquote>”;

42

}

43

}

 

 

44

// finish by noting how many files have been read

45

echo “<h2>Read ” . count($lines) . “</h2>”;

 

46

?>

Run this script in a browser and it will create a page with the results. (The script is available at GitHub.)

Thanks, Perma.cc!



By the way, and mainly because I keep losing track of this info, the table of code was created by a little service cleverly called Convert JS to Table.

Previous: « || Next: »

Leave a Reply

Comments (RSS).  RSS icon