Joho the Blogutilities Archives - Joho the Blog

November 25, 2018

Using the Perma.cc API to check links

My new book (Everyday Chaos, HBR Press, May 2019) has a few hundred footnotes with links to online sources. Because Web sites change and links rot, I decided to link to Perma.cc‘s pages instead . Perma.cc is a product of the Harvard Library Innovation Lab, which I used to co-direct with Kim Dulin, but Perma is a Jonathan Zittrain project from after I left.

When you give Perma.cc a link to a page on the Web, it comes back with a link to a page on the Perma.cc site. That page has an archive copy of the original page exactly as it was when you supplied the link. It also makes a screen capture of that original page. And of course it includes a link to the original. It also promises to maintain the Perma.cc copy and screen capture in perpetuity — a promise backed by the Harvard Law Library and dozens of other libraries. So, when you give a reader a Perma link, they are taken to the Perma.cc page where they’ll always find the archived copy and the screen capture, no matter what happens to the original site. Also, the service is free for everyone, for real. Plus, the site doesn’t require users to supply any information about themselves. Also, there are no ads.

So that’s why my book’s references are to Perma.cc.

But, over the course of the six years I spent writing this book, my references suffered some link rot on my side. Before I got around to creating the Perma links, I managed to make all the obvious errors and some not so obvious. As a result, now that I’m at the copyediting stage, I wanted to check all the Perma links.

I had already compiled a bibliography as a spreadsheet. (The book will point to the Perma.cc page for that spreadsheet.) So, I selected the Title and Perma Link columns, copied the content, and stuck it into a text document. Each line contains the page’s headline and then the Perma link.

Perma.cc has an API that made it simple to write a script that looks up each Perma link and prints out the title it’s recorded next to the title of the page that I intend to be linked. If there’s a problem with Perma link, such as a double “https://https://” (a mistake I managed to introduce about a dozen times), or if the Perma link is private and not accessible to the public, it notes the problem. The human brain is good at scanning this sort of info, looking for inconsistencies.

Here’s the script. I used PHP because I happen to know it better than a less embarrassing choice such as Python and because I have no shame.

1

<?php

 

2

// This is a basic program for checking a list of page titles and perma.cc links

3

// It’s done badly because I am a terrible hobbyist programmer.

4

// I offer it under whatever open source license is most permissive. I’m really not

5

// going to care about anything you do with it. Except please note I’m a

6

// terrible hobbyist programmer who makes no claims about how well this works.

7

//

8

// David Weinberger

9

// [email protected]

10

// Nov. 23, 2018

 

11

// Perma.cc API documentation is here: https://perma.cc/docs/developer

 

12

// This program assumes there’s a file with the page title and one perma link per line.

13

// E.g. The Rand Corporation: The Think Tank That Controls America https://perma.cc/B5LR-88CF

 

14

// Read that text file into an array

15

$lines = file(‘links-and-titles.txt’);

 

 

16

for ($i = 0; $i < count($lines); $i++){

17

$line = $lines[$i];

18

// divide into title and permalink

19

$p1 = strpos($line, “https”); // find the beginning of the perma link

20

$fullperma = substr($line, $p1); // get the full perma link

21

$origtitle = substr($line, 0,$p1); // get the title

22

$origtitle = rtrim($origtitle); // trim the spaces from the end of the title

 

23

// get the distinctive part of the perma link: the stuff after https://perma.cc/

24

$permacode = strrchr($fullperma,”/”); // find the last forward slash

25

$permacode = substr($permacode,1,strlen($permacode)); // get what’s after that slash

26

$permacode = rtrim($permacode); // trim any spaces from the end

 

27

// create the url that will fetch this perma link

28

$apiurl = “https://api.perma.cc/v1/public/archives/” . $permacode . “/”;

 

29

// fetch the data about this perma link

30

$onelink = file_get_contents($apiurl);

31

// echo $onelink; // this would print the full json

32

// decode the json

33

$j = json_decode($onelink, true);

34

// Did you get any json, or just null?

35

if ($j == null){

36

// hmm. This might be a private perma link. Or some other error

37

echo “<p>– $permacode failed. Private? $permaccode</p>”;

38

}

39

// otherwise, you got something, so write some of the data into the page

40

else {

41

echo “<b>” . $j[“guid”] . ‘</b><blockquote>’ . $j[“title”] . ‘<br>’ . $origtitle . “<br>” . $j[“url”] . “</blockquote>”;

42

}

43

}

 

 

44

// finish by noting how many files have been read

45

echo “<h2>Read ” . count($lines) . “</h2>”;

 

46

?>

Run this script in a browser and it will create a page with the results. (The script is available at GitHub.)

Thanks, Perma.cc!



By the way, and mainly because I keep losing track of this info, the table of code was created by a little service cleverly called Convert JS to Table.

2 Comments »

December 27, 2015

Embedded endnote extractor

I’ve updated a 2009 utility that lets you embed your end notes in the text you’re typing. The utility, Footnoter, extracts the endnotes, leaves a footnote number, and compiles a list of the endnotes with numbers and links. It now works with Markdown as well as with HTML; I use Markdown for most of what I write these days.

In other words, let’s say you type this in a document you’re creating with Markdown:

I write using Markdown. ((See John Gruber’s Daring Fireball for more.)) Markdown lets you embed formatting codes into plain text that are then rendered into formatted HTML, Word, etc.((The Marked app adds a viewer with export capabilities. It’s on sale for $9.99 right now.)), enabling me to focus purely on what I’m saying. It also lets me keep my fingers on the keyboard.

If you paste this text into Footnoter and tell it you want Markdown output, it will treat the comments between the double parentheses as endnotes. It will remove those comments from the body of the text, leaving the Markdown code for an endnote number, and will compile a list of endnotes with the proper references back to their endnote numbers. That is, it does what you would expect. At least with my limited testing.

For Markdown, that means the above text gets turned into this:

I write using Markdown.[^fn2] Markdown lets you embed formatting codes into plain text that are then rendered into formatted HTML, Word, etc.[^fn3], enabling me to focus purely on what I’m saying. It also lets me keep my fingers on the keyboard.

[^fn2]:See John Gruber’s Daring Fireball for more.
[^fn3]:The Marked app adds a viewer with export capabilities. It’s on sale for $9.99 right now.

Don’t be freaked out. That’s what endnotes look like in Markdown. When you run them through a parser, they’ll have appropriately numbered superscripts. (Footnoter generates arbitrary unique Markdown labels for endnotes; they start with “fn” and then have numbers appended sequentially. Those numbers have nothing to do with the number the parser will assign to the endnote itself. Also, yes, it’s a little bug that Footnoter starts with fn2 instead of fn1. Non-critical. I’m working on it. [Minutes later]: Fixed it. I think.)

The same thing happens if you are writing HTML except the markup that’s generated is more like this:

I write using Markdown.<span class=’fn_in_text’><a name=’fn2′><a href=#fnend2>2</a><</span> Markdown lets you embed formatting codes into plain text that are then rendered into formatted HTML, Word, etc.<span class=’fn_in_text’><a name=’fn3′><a href=#fnend3>3</a></span>, enabling me to focus purely on what I’m saying. It also lets me keep my fingers on the keyboard.

And that gets rendered in a browser as this:

I write using Markdown.2 Markdown lets you embed formatting codes into plain text that are then rendered into formatted HTML, Word, etc.3, enabling me to focus purely on what I’m saying. It also lets me keep my fingers on the keyboard.

There are a number of options, including setting the delimiters for endnotes and, for HTML, which endnote number to begin with. By default it removes the space before an endnote, so you can put a space between the word where the superscript should be and your delimiters, making your text easier to read when you’re working on it.

Also, if you work on a text, run it through Footnoter, work on it some more and add more endnotes, Footnoter should detect that and begin its arbitrary numbering of Markdown endnotes above where you left off. That means you can run it through more than once and it should still work.

Should.

Note: This code is from 2009. I’ve learned some stuff since then, including that jQuery makes life easier. When I added the Markdown option yesterday, I didn’t bother cleaning up the old code. It is particularly hideous. You can gape at its uglinesss at github.

PS: Yes, I really should have named it “Endnoter.”

Comments Off on Embedded endnote extractor

December 3, 2012

Tab Rocker – a Chrome utility to return to previous tab

I get lost in my browser tabs all the time. The place I most often want to go is back to the tab I was just in. On Firefox, there are a few utilities that let me do that. My nephew Joel Weinberger has written one that does that and nothing but that for Chrome. You can grab it (free, of course) here. (The source code is on github.)

Joel wrote this, as the result of my whining, during our annual post-Thanksgiving-dinner viewing of Jurassic Park, although he did some clean up of the code afterwards. I should add that, among other things, Joel is a certified computer genius working in deep areas of computer security.

Comments Off on Tab Rocker – a Chrome utility to return to previous tab

May 15, 2009

Footnoter

I frequently write in HTML, but find footnotes (endnotes, actually) a pain. I don’t like having to interrupt my flow to jump to the end of the document to plunk in a footnote, and I hate having to decide on a number not knowing if I might decide to insert a footnote ahead of the one I just inserted. So, primarily because I enjoy writing utilities for myself, I spent far more hours writing a tool that will make it easier for me than the tool itself will save.

Footnoter lets you embed footnotes in the middle of an HTM document. [[For example, this might be a footnote]] It looks for the designated delimiters, pulls the footnote out, puts it at the end, and leaves a hyperlinked number in its stead. It defaults to the quick-and-dirty HTML that uses <sup> to superscript the number, but the Advanced section lets you instead insert CSS classes for the marker in the text, the marker that precedes the footnote, and for the footnote itself.

Some warnings if you decide to try it out. First, It’s fragile. I’ve barely tested it. I’m sure there will be lots of ways it can be broken. (Nested footnotes won’t work.) Second, you would be a damned fool to paste its results over your only copy of the document you’ve been working on. Third, I am a baboonish, flatfooted writer of programs. What I write is the oppoosite of elegant: I prefer the long way of doing it and of writing it, since I can barely follow what I’m doing. Besides, the programs I write are so small and confined that efficiency doesn’t really matter.

If you care to try Footnoter out, with fear in your limbs and forgiveness in your heart, it’s here.

[Tags: ]

Comments Off on Footnoter