Joho the Blogtech Archives - Joho the Blog

May 18, 2017

Indistinguishable from prejudice

“Any sufficiently advanced technology is indistinguishable from magic,” said Arthur C. Clarke famously.

It is also the case that any sufficiently advanced technology is indistinguishable from prejudice.

Especially if that technology is machine learning. ML creates algorithms to categorize stuff based upon data sets that we feed it. Say “These million messages are spam, and these million are not,” and ML will take a stab at figuring out what are the distinguishing characteristics of spam and not spam, perhaps assigning particular words particular weights as indicators, or finding relationships between particular IP addresses, times of day, lenghts of messages, etc.

Now complicate the data and the request, run this through an artificial neural network, and you have Deep Learning that will come up with models that may be beyond human understanding. Ask DL why it made a particular move in a game of Go or why it recommended increasing police patrols on the corner of Elm and Maple, and it may not be able to give an answer that human brains can comprehend.

We know from experience that machine learning can re-express human biases built into the data we feed it. Cathy O’Neill’s Weapons of Math Destruction contains plenty of evidence of this. We know it can happen not only inadvertently but subtly. With Deep Learning, we can be left entirely uncertain about whether and how this is happening. We can certainly adjust DL so that it gives fairer results when we can tell that it’s going astray, as when it only recommends white men for jobs or produces a freshman class with 1% African Americans. But when the results aren’t that measurable, we can be using results based on bias and not know it. For example, is anyone running the metrics on how many books by people of color Amazon recommends? And if we use DL to evaluate complex tax law changes, can we tell if it’s based on data that reflects racial prejudices?[1]

So this is not to say that we shouldn’t use machine learning or deep learning. That would remove hugely powerful tools. And of course we should and will do everything we can to keep our own prejudices from seeping into our machines’ algorithms. But it does mean that when we are dealing with literally inexplicable results, we may well not be able to tell if those results are based on biases.

In short: Any sufficiently advanced technology is indistinguishable from prejudice.[2]

[1] We may not care, if the result is a law that achieves the social goals we want, including equal and fair treatment of tax players regardless of race.

[2] Please note that that does not mean that advanced technology is prejudiced. We just may not be able to tell.

Be the first to comment »

May 15, 2017

[liveblog][AI] AI and education lightning talks

Sara Watson, a BKC affiliate and a technology critic, is moderating a discussion at the Berkman Klein/Media Lab AI Advance.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Karthik Dinakar at the Media Lab points out what we see in the night sky is in fact distorted by the way gravity bends light, which Einstein called a “gravity lens.” Same for AI: The distortion is often in the data itself. Karthik works on how to help researchers recognize that distortion. He gives an example of how to capture both cardiologist and patient lenses to better to diagnose women’s heart disease.

Chris Bavitz is the head of BKC’s Cyberlaw Clinic. To help Law students understand AI and tech, the Clinic encourages interdisciplinarity. They also help students think critically about the roles of the lawyer and the technologist. The clinic prefers early relationships among them, although thinking too hard about law early on can diminish innovation.

He points to two problems that represent two poles. First, IP and AI: running AI against protected data. Second, issues of fairness, rights, etc.

Leah Plunkett, is a professor at Univ. New Hampshire Law School and is a BKC affiliate. Her topic: How can we use AI to teach? She points out that if Tom Sawyer were real and alive today, he’d be arrested for what he does just in the first chapter. Yet we teach the book as a classic. We think we love a little mischief in our lives, but we apparently don’t like it in our kids. We kick them out of schools. E.g., of 49M students in public schools in 20-11, 3.45M were suspended, and 130,000 students were expelled. These disproportionately affect children from marginalized segments.

Get rid of the BS safety justification and the govt ought to be teaching all our children without exception. So, maybe have AI teach them?

Sarah: So, what can we do?

Chris: We’re thinking about how we can educate state attorneys general, for example.

Karthik: We are so far from getting users, experts, and machine learning folks together.

Leah: Some of it comes down to buy-in and translation across vocabularies and normative frameworks. It helps to build trust to make these translations better.

[I missed the QA from this point on.]

Be the first to comment »

[liveblog] AI Advance opening: Jonathan Zittrain and lightning talks

I’m at a day-long conference/meet-up put on by the Berkman Klein Center‘s and MIT Media Lab‘s “AI for the Common Good” project.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Jonathan Zittrain gives an opening talk. Since we’re meeting at Harvard Law, JZ begins by recalling the origins of what has been called “cyber law,” which has roots here. Back then, the lawyers got to the topic first, and thought that they could just think their way to policy. We are now at another signal moment as we are in a frenzy of building new tech. This time we want instead to involve more groups and think this through. [I am wildly paraphrasing.]

JZ asks: What is it that we intuitively love about human judgment, and are we willing to insist on human judgments that are worse than what a machine would come up with? Suppose for utilitarian reasons we can cede autonomy to our machines — e.g., autonomous cars — shouldn’t we? And what do we do about maintaining local norms? E.g., “You are now entering Texas where your autonomous car will not brake for pedestrians.”

“Should I insist on being misjudged by a human judge because that’s somehow artesinal?” when, ex hypothesis, an AI system might be fairer.

Autonomous systems are not entirely new. They’re bringing to the fore questions that have always been with us. E.g., we grant a sense of discrete intelligence to corporations. E.g., “McDonald’s is upset and may want to sue someone.”

[This is a particularly bad representation of JZ’s talk. Not only is it wildly incomplete, but it misses the through-line and JZ’s wit. Sorry.]

Lightning Talks

Finale Doshi-Velez is particularly interested in interpretable machine learning (ML) models. E.g., suppose you have ten different classifiers that give equally predictive results. Should you provide the most understandable, all of them…?

Why is interpretability so “in vogue”? Suppose non-interpretable AI can do something better? In most cases we don’t know what “better” means. E.g., someone might want to control her glucose level, but perhaps also to control her weight, or other outcomes? Human physicians can still see things that are not coded into the model, and that will be the case for a long time. Also, we want systems that are fair. This means we want interpretable AI systems.

How do we formalize these notions of interpretability? How do we do so for science and beyond? E.g., what is a legal “right to explanation
” mean? She is working with Sam Greshman on how to more formally ground AI interpretability in the cognitive science of explanation.

Vikash Mansinghka leads the eight-person Probabilistic Computing project at MIT. They want to build computing systems that can be our partners, not our replacements. We have assumed that the measure of success of AI is that it beats us at our own game, e.g., AlphaGo, Deep Blue, Watson playing Jeopardy! But games have clearly measurable winners.

His lab is working on augmented intelligence that gives partial solutions, guidelines and hints that help us solve problems that neither system could solve on their own. The need for these systems are most obvious in large-scale human interest projects, e.g., epidemiology, economics, etc. E.g., should a successful nutrition program in SE Asia be tested in Africa too? There are many variables (including cost). BayesDB, developed by his lab, is “augmented intelligence for public interest data science.”

Traditional computer science, computing systems are built up from circuits to algorithms. Engineers can trade off performance for interpretability. Probabilisitic systems have some of the same considerations. [Sorry, I didn’t get that last point. My fault!]

John Palfrey is a former Exec. Dir. of BKC, chair of the Knight Foundation (a funder of this project) and many other things. Where can we, BKC and the Media Lab, be most effective as a research organization? First, we’ve had the most success when we merge theory and practice. And building things. And communicating. Second, we have not yet defined the research question sufficiently. “We’re close to something that clearly relates to AI, ethics and government” but we don’t yet have the well-defined research questions.

The Knight Foundation thinks this area is a big deal. AI could be a tool for the public good, but it also might not be. “We’re queasy” about it, as well as excited.

Nadya Peek is at the Media Lab and has been researching “macines that make machines.” She points to the first computer-controlled machine (“Teaching Power Tools to Run Themselves“) where the aim was precision. People controlled these CCMs: programmers, CAD/CAM folks, etc. That’s still the case but it looks different. Now the old jobs are being done by far fewer people. But the spaces between doesn’t always work so well. E.g., Apple can define an automatiable workflow for milling components, but if you’re student doing a one-off project, it can be very difficult to get all the integrations right. The student doesn’t much care about a repeatable workflow.

Who has access to an Apple-like infrastructure? How can we make precision-based one-offs easier to create? (She teaches a course at MIT called “How to create a machine that can create almost anything.”)

Nathan Mathias, MIT grad student with a newly-minted Ph.D. (congrats, Nathan!), and BKC community member, is facilitating the discussion. He asks how we conceptualize the range of questions that these talks have raised. And, what are the tools we need to create? What are the social processes behind that? How can we communicate what we want to machines and understand what they “think” they’re doing? Who can do what, where that raises questions about literacy, policy, and legal issues? Finally, how can we get to the questions we need to ask, how to answer them, and how to organize people, institutions, and automated systems? Scholarly inquiry, organizing people socially and politically, creating policies, etc.? How do we get there? How can we build AI systems that are “generative” in JZ’s sense: systems that we can all contribute to on relatively equal terms and share them with others.

Nathan: Vikash, what do you do when people disagree?

Vikash: When you include the sources, you can provide probabilistic responses.

Finale: When a system can’t provide a single answer, it ought to provide multiple answers. We need humans to give systems clear values. AI things are not moral, ethical things. That’s us.

Vikash: We’ve made great strides in systems that can deal with what may or may not be true, but not in terms of preference.

Nathan: An audience member wants to know what we have to do to prevent AI from repeating human bias.

Nadya: We need to include the people affected in the conversations about these systems. There are assumptions about the independence of values that just aren’t true.

Nathan: How can people not close to these systems be heard?

JP: Ethan Zuckerman, can you respond?

Ethan: One of my colleagues, Joy Buolamwini, is working on what she calls the Algorithmic Justice League, looking at computer vision algorithms that don’t work on people of color. In part this is because the tests use to train cv systems are 70% white male faces. So she’s generating new sets of facial data that we can retest on. Overall, it’d be good to use test data that represents the real world, and to make sure a representation of humanity is working on these systems. So here’s my question: We find co-design works well: bringing in the affected populations to talk with the system designers?

[Damn, I missed Yochai Benkler‘s comment.]

Finale: We should also enable people to interrogate AI when the results seem questionable or unfair. We need to be thinking about the proccesses for resolving such questions.

Nadya: It’s never “people” in general who are affected. It’s always particular people with agendas, from places and institutions, etc.

Be the first to comment »

February 1, 2017

How to fix the WordFence wordfence-waf.php problem

My site has been down while I’ve tried to figure out (i.e., google someone else’s solution) to a crash caused by WordFence, an excellent utility that, ironically, protects your WordPress blog from various maladies.

The problem is severe: Users of your blog see naught but an error message of this form:

Fatal error: Unknown: Failed opening required ‘/home/dezi3014/public_html/wordfence-waf.php’ (include_path=’…/usr/lib/php /usr/local/lib/php’) in Unknown on line 0

The exact path will vary, but the meaning is the same. It is looking for a file that doesn’t exist. You’ll see the same message when you try to open your WordPress site as administrator. You’ll see it even when you manually uninstall WordPress by logging into your host and deleting the wordfence folder from the wp-content/plugins folder

If you look inside the wordfence-waf.php file (which is in whatever folder you’ve installed WordPress into), it warns you that “Before removing this file, please verify the PHP ini setting `auto_prepend_file` does not point to this.”

Helpful, except my php.ini file doesn’t have any reference to this. (I use MediaTemple.com as my host.) Some easy googling disclosed that the command to look for the file may not be in php.ini, but may be in .htaccess or .user.ini instead. And now you have to find those files.

At least for me, the .user.ini file is in the main folder into which you’ve installed WordPress. In fact, the only line in that file was the one that has the “auto_prepend_file” command. Remove that line and you have your site back.

I assume all of this is too obvious to write about for technically competent people. This post is for the rest of us.

4 Comments »

December 3, 2016

[liveblog] Stephanie Mendoza: Web VR

Stephanie Mendoza [twitter:@_liooil] [Github: SAM-liooil] is giving a talk at the Web 1.0 conference. She’s a Unity developer.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

WebVR— a 3D in-browser standard— is at 1.0 these days, she says.. It’s cross platform which is amazing because it’s hard to build for Web, Android, and Vive. It’s “uncharted territory” where “everything is an experiment.” You need Chromium
, an experimental version of Chrome, to run it. She uses A-Frame to create in-browser 3D environments.

“We’re trying to figure out the limit of things we can simulate.” It’s going to follow us out into the real world. E.g., she’s found that “Simulating fearful situations ) can lessen fear of those situations in the real world”simulating fearful situations (e.g., heights) can lessen fear of those situations in the real world.

This crosses into Meinong’s jungle: a repository of non-existent entities in Alexius Meinong‘s philosophy.

The tool they’re using is A-Frame, which is an abstraction layer on top of WebGL
, Three.js, and VRML. (VRML was an HTML standard that didn’t get taken up much because the browsers didn’t run it very well. [I was once on the board of a VRML company which also didn’t do very well.]) WebVR works on Vibe, High Fidelity, Janus, the Unity Web player, and Youtube 360, under different definitions of “works.” A-Frame is open source.

Now she takes us through how to build a VR Web page. You can scavenge for 3D assets or create your own. E.g., you can go to Thingiverse and convert the files to the appropriate format for A-Frame.

Then you begin a “scene” in A-Frame, which lives between <a-scene> tags in HTML. You can create graphic objects (spheres, planes, etc.) You can interactively work on the 3D elements within your browser. [This link will take you to a page that displays the 3D scene Stephanie is working with, but you need Chromium to get to the interactive menus.]

She goes a bit deeper into the A-Frame HTML for assets, light maps, height maps, specular maps, all of which are mapped back to much lower-count polygons. Entities consist of geometry, light, mesh, material, position, and raycaster, and your extensions. [I am not attempting to record the details, which Stephanie is spelling out clearly. ]

She talks about the HTC Vive. “The controllers are really cool. “They’re like claws. I use them to climb virtual trees and then jump out”They’re like claws. I use them to climb virtual trees and then jump out because it’s fun.” Your brain simulates gravity when there is none, she observes. She shows the A-Frame tags for configuring the controls, including gabbing, colliding, and teleporting.

She recommends some sites, including NormalMap, which maps images and lets you download the results.

QA

Q: Platforms are making their own non-interoperable VR frameworks, which is concerning.

A: It went from art to industry very quickly.

Be the first to comment »

[liveblog] Paul Frazee on the Beaker Browser

At the Web 1.0 conference, Paul Frazee
is talking about a browser — a Chrome fork — he’s been writing to browse the distributed Web.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

The distributed Web is the Web with ideas from BitTorrent integrated into it. Beaker
uses IPFS and DAT

  • This means:

  1. Anyone can be a server at any time.

  2. There’s no binding between a specific computer and a site; the content lives independently.

  3. There’s no back end.

This lets Beaker provide some unusual features:

  1. A “fork button” is built into the browser itself so you can modify the site you’re browsing. “People can hack socially” by forking a site and sharing those changes.

  2. Independent publishing: The site owner can’t change your stuff. You can allocate new domains cheaply.

  3. With Beaker, you can write your site locally first, and then post into the distributed Web.

  4. Secure distribution

  5. Versioned URLs

He takes us through a demo. Beaker’s directory looks a bit like Github in terms of style. He shows how to create a new site using an integrated terminal tool. The init command creates a dat.json file with some core metadata. Then he creates an index.html file and publishes it. Then anyone using the browser can see the site and ask to see the files behind it…and fork them. As with GitHub, you can see the path of forks. If you own the site, you can write to the site, with the browser. [This fulfills Tim Berners’-Lee’s original vision of Web browsers.]

QA

Q: Any DNS support?

A: Yes.

Be the first to comment »

November 22, 2016

[liveblog][bkc] Scott Bradner: IANA: Important, but not for what they do"

I’m at a Berkman Klein [twitter: BKCHarvard] talk by Scott Bradner about IANA, the Internet Assigned Names Authority. Scott is one of the people responsible for giving us the Internet. So, thanks for that, Scott!

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Scott begins by pointing to the “absurdity” of Ted Cruz’s campaign
to prevent the “Internet giveaway.”“ The idea that “Obama gave away the Internet” is “hooey,”” The idea that “Obama gave away the Internet” is “hooey,” says Scott.

IANA started with a need to coordinate information, not to control it, he says. It began with the Network Working Group in 1968. Then Requests for Comments (RFC) in 1969. . The name “IANA” showed up in 1988, although the function had begun in 1972 with coordinating socket numbers. The Domain Name System made IP addresses easier to use, including the hierarchical clustering under .com, .org, etc.

Back to the beginning, computers were too expensive for every gov’t department to have one. So, ARPA wanted to share large and expensive computers among users. It created a packet-based network, which broke info up into packets that were then transmitted. Packet networking was the idea of Paul Baran at RAND who wanted a system that would survive a nuclear strike, but the aim of that network was to share computers. The packets had enough info to make it to their destinations, but the packet design made “no assumptions about the underlying transport network.” No service guarantees about packets making it through were offered. The Internet is the interconnection of the different networks, including the commercial networks that began showing up in the 1990s.

No one cared about the Net for decades. To the traditional telecom and corporate networking people, it was just a toy—”No quality of service, no guarantees, no security, no one in charge.” IBM thought you couldn’t build a network out of this because their definition of a network — the minimal requirements — was different. “That was great because it meant the regulators ignored us.”

The IANA function went into steady state 1984-1995. It did some allocating of addresses. (When Scott asked Jon Postel for addresses for Harvard, Postel sent him some; Postel was the one-person domain allocation shop.) IANA ran it for the top level domains.

“The Internet has few needs,” Scott says. It’s almost all done through collaboration and agreement. There are no requirements except at a very simple level. The only centralized functions: 1. We have to agree on what the protocol parameters are. Machines have to understand how to read the packet headers. 2. We have to allocate blocks of IP addresses and ASN‘s. 3. We have to have a single DNS, at least for now. IANA handles those three. “Everything else is distributed.” Everything else is collaboration.

In 1993, Network Solutions was given permission to start selling domain names. A domain cost $100 for 2 yrs. There were were about 100M names at that point, which added up to real money. Some countries even started selling off their TLD’s (top level domains), e.g., .tv

IANA dealt with three topics, but DNS was the only one of interest to most people. There was pressure to create new TLDs, which Scott thinks doesn’t solve any real problems. That power was given to ISOC, which set up the International Ad-Hoc Committee in 1996. It set up 7 new TLDs, one of which (.web) caused Image Online Design to sue Postel because they said Postel had promised it to them. The Dept. of Commerce saw that it needed to do something. So they put out an RFC and got 400+ comments. Meanwhile, Postel worked on a plan for institutionalizing the IANA function, which culminated in a conference in Jan 1998. Postel couldn’t go, so Scott presented in his stead.

Shortly after that the Dept of Commerce proposed having a private non-profit coordinate and manage the allocation of the blocks to the registries, manage the file that determines TLDs, and decide which TLDs should exist…the functions of IANA. “There’s no Internet governance here, simply what IANA did.”

There were meetings around the world to discuss this, including one sponsored by the Berkman Center. Many of the people attending were there to discuss Internet governance, which was not the point of the meetings. One person said, “Why are we wasting time talking about TLDs when the Internet is going to destroy countries?” “Most of us thought that was a well-needed vacuum,” says Scott. We didn’t need Internet governance. We were better off without it.

Jon Postel submitted a proposal for an Internet Corporation for Assigned Names and Numbers (ICANN). He died of a heart attack shortly thereafter. The Dept. of Commerce accepted the proposal. In Oct 1998 ICANN had its first board meeting. It was a closed meeting “which anticipated much of what’s wrong with ICANN.”

The Dept of Commerce had oversight over ICANN but its only power was to say yes or no to the file that lists the TLDs and the IP addresses of the nameservers for each of the TLDs.” “That’s the entirety of the control the US govt had over ICANN. “In theory, the Dept of Commerce could have said ‘Take Cuba out of that file,’ but that’s the most ridiculous thing they could have done and most of the world could have ignored them.” The Dept of Commerce never said no to ICANN.

ICANN institutionalizes the IANA. But it also has to deal with trademark issues coming out of domain name registrations, and consults on DNS security issues. “ICANN was formed as a little organization to replace Jon Postel.”

It didn’t stay little. ICANN’s budget went from a few million bucks to over $100M.“ “That’s a lot of money to replace a few competent geeks.”” “That’s a lot of money to replace a few competent geeks.” It’s also approved hundreds of TLDs. The bylaws went from 7,000 words to 37,000 words. “If you need 37,000 words to say what you’re doing, there’s something wrong.”

The world started to change. Many govts see the Net as an intrinsic threat.

  • In Sept. 2001, India, Brazil, and South Africa proposed that the UN undertake governance of the Internet.

  • Oct 2013: After Snowden, the Montevideo Statement on the Future of Internet Cooperation proposing moving away from US govt’s oversight of IANA.

  • Apr. 2014: NetMundial Initiative. “Self-appointed 25-member council to perform internet governance.”

  • Mar. 2014: NTIA announces its intent to transition key domain name functions.

The NTIA proposal was supposed to involve all the stakeholders. But it also said that ICANN should continue to maintain the openness of the Internet…a function that ICANN never had. Openness arises from the technical nature of the Net. NTIA said it wouldn’t accept an inter-governmental solution (like the ITU) because it has to involve all the stakeholders.

So who holds ICANN accountable? They created a community process that is “incredibly strong.” It can change the bylaws, and remove ICAN directors or the entire board.

Meanwhile, the US Congress got bent out of shape because the US is “giving away the Internet.” It blocked the NTIA from acting until Sept. 2016. On Oct. 1 IANA became independent and is under the control of the community. “This cannot be undone.” “If the transition had not happened, forces in the UN would likely have taken over” governance of the Internet. This would have been much more likely if the NTIA had not let it go. “The IANA performs coordination functions, not governance. There is no Internet governance.”

How can there be no governance? “Because nobody cared for long enough that it got away from them,” Scott says. “But is this a problem we have to fix?”

He leaves the answer hanging. [SPOILER: The answer is NO]

Q&A

Q: Under whom do the IRI‘s [Internationalized Resource Identifier] operate?

A: Some Europeans offered to take over European domain names from Jon Postel. It’s an open question whether they have authority to do what they’re doing Every one has its own policy development process.

Q: Where’s research being done to make a more distributed Internet?

A: There have been many proposals ever since ICANN was formed to have some sort of distributed maintenance of the TLDs. But it always comes down to you seeing the same .com site as I do — the same address pointing to the same site for all Internet users. You still have to centralize or at least distribute the mapping. Some people are looking at geographic addressing, although it doesn’t scale.

Q: Do you think Trump could make the US more like China in terms of the Internet?

A: Trump signed on to Cruz’s position on IANA. The security issue is a big one, very real. The gut reaction to recent DDOS
attacks is to fix that rather than to look at the root cause, which was crappy devices. The Chinese government controls the Net in China by making everyone go through a central, national connection. Most countries don’t do that. OTOH, England is imposing very strict content

rules that all ISPs have to obey. We may be moving to a telephony model, which is a Westphalian
idea of national Internets.

Q: The Net seems to need other things internationally controlled, e.g. buffer bloat. Peer pressure seems to be the only way: you throw people off who disagree.

A: IANA doesn’t have agreements with service providers. Buffer bloat is a real issue but it only affects the people who have it, unlike the IoT DDOS attack that affected us all. Are you going to kick off people who’s home security cameras are insecure?

Q: Russia seems to be taking the opposite approach. It has lots of connections coming into it, perhaps for fear that someone would cut them off. Terrorist groups are cutting cables, botnets, etc.

A: Great question. It’s not clear there’s an answer.

Q: With IPv6 there are many more address spaces to give out. How does that change things?

A: The DNS is an amazing success story. It scales extremely well … although there are scaling issues with the backbone routing systems, which are big and expensive. “That’s one of the issues we wanted to address when we did IPv6.”

Q: You said that ICANN has a spotty history of transparency. What role do you think ICANN is going to play going forward? Can it improve on its track record?

A: I’m not sure that it’s relevant. IANA’s functions are not a governance function. The only thing like a governance issue are the TLDs and ICANN has already blown that.

Be the first to comment »

June 25, 2016

TED, scraped

TED used to have an open API. TED no longer supports its open API. I want to do a little exploring of what the world looks like to TED, so I scraped the data from 2,228 TED Talk pages. This includes the title, author, tags, description, link to the transcript, number of times shared, and year. You can get it from here. (I named it tedTalksMetadata.txt, but it’s really a JSON file.)

“Scraping” means having a computer program look at the HTML underneath a Web page and try to figure out which elements refer to what. Scraping is always a chancy enterprise because the cues indicating which text is,say, the date and which is the title may be inconsistent across pages, and may be changed by the owners at any time. So I did the best I could, which is not very good. (Sometimes page owners aren’t happy about being scraped, but in this case it only meant one visit for each page, which is not a lot of burden for a site that has pages that get hundreds of thousands and sometimes millions of visits. If they really don’t want to be scraped, they could re-open their API, which provides far more reliable info far more efficiently.)

I’ve also posted at GitHub the php scripts I wrote to do the scraping. Please don’t laugh.

If you use the JSON to explore TED metadata, please let me know if you come up with anything interesting that you’re willing to share. Thanks!

Be the first to comment »

January 26, 2016

Oscars.JSON. Bad, bad JSON

Because I don’t actually enjoy watching football, during the Pats vs. Broncs game on Sunday I transposed the Oscar™ nominations into a JSON file. I did this very badly, but I did it. If you look at it, you’ll see just how badly I misunderstand JSON on some really basic levels.

But I posted it at GitHub™ where you can fix it if you care enough.

Why JSON™? Because it’s an easy format for inputting data into JavaScript™ or many other languages. It’s also human-readable, if you have a good brain for indents. (This is very different from having many indents in your brain, which is one reason I don’t particularly like to watch football™, even with the helmets and rules, etc.)

Anyway, JSON puts data into key:value™ pairs, and lets you nest them into sets. So, you might have a keyword such as “category” that would have values such as “Best Picture™” and “Supporting Actress™.” Within a category you might have a set of keywords such as “film_title” and “person” with the appropriate keywords.

JSON is such a popular way of packaging up data for transport over the Web™ that many (most? all?) major languages have built-in functions for transmuting it into data that the language can easily navigate.

So, why bother putting the Oscar™ nomination info into JSON? In case someone wants to write an app that uses that info. For example, if you wanted to create your own Oscar™ score sheet or, to be honest, entry tickets for your office pool, you could write a little script and output it exactly as you’d like. (Or you could just google™ for someone else’s Oscar™ pool sheet.) (I also posted a terrible little PHP script™ that does just that.)

So, a pointless™ exercise™ in truly terrible JSON design™. You’re welcome™!

Be the first to comment »

October 5, 2015

Enabling JavaScript to read files on your drive via Dropbox: A “Where I went wrong” puzzle.

Ermahgerd, this was so much harder than I thought it would be. In fact, what follows is best approached as a puzzler in which your task is to find the earliest place where I’ve gone horribly wrong. The winning comment will be of the form, “You’re such an idiot! All you had to do was____!” Second place, because less satisfying but no less humiliating, will be comments of the form, “OMFG, why are you writing this? How can you get the simplest thing wrong???”

I know. Forgive me.


So, let’s say you’re writing, oh, an outliner for your own personal use because the one you’ve been using for seven years or so no longer supports Dropbox: If you save an OmniOutliner file to a Dropbox folder, it only gets one of the sub-files. You poke around with alternatives but none of have exactly the set of minimal features you want. (Dave Winer’s Fargo gets damn close to my peculiarities, and it saves outlines in Dropbox…but in only one special folder. I’m picky. And I was looking for a programming project.) So, you decide to write your own. Sure, it’ll be under-featured, and it’ll break. But it’ll be yours.


It’s going to run in a browser because you can’t find any other way to write an app for your Mac except Objective C and Swift, both of which require three days of tutorials and a hundred pushups to get to “Hello, world.” So, you’re using JavaScript and jQuery, JavaScript’s smarter older brother. (Hi, Andy.) And PHP.


Now, you can try as hard as you want, but “The browser is going to insist on protecting you from accessing files on anyone’s hard drive, even your own”the browser is going to insist on protecting you from being able to access files on anyone’s hard drive, even your own, because otherwise Malefactors are going to install apps that will suck your brains out through your smallest aperture and take your credit card numbers with it. For real.

I tried many of the things the Internet recommends to circumvent this well-meaning rule. I wouldn’t have even tried, but I’m running my outliner on my local hard drive, using the Apache2 web server that comes with MAMP. So, I understand why there’s a wall around the files that are not part of what the web server considers to be its home, but those files are mind. So close, yet so far.

I tried figuring out how to set up a virtual directory, but the initial efforts failed and monkeying with apache files scares me. Likewise for changing the server’s document root.

I put a symbolic link to my Dropbox folder into the JavaScript project’s folder (actually in the “php” sub-folder), and was able to write a file into it via PHP. But I couldn’t figure out a way to read the Dropbox folder, which means that if I wanted to switch from loading an outline from

/Dropbox/blogposts/2015/October

to

/Dropbox/Bad_Ideas/Recipes_For_Disaster


I’d have to type in the entire pathname. No directory browsing for you!

(To create a symbolic link, in your Mac terminal type: “ln -s /Users/YOUR_NAME/Dropbox”. )


So, I had a brainstorm. I use outlines in almost everything I do, but virtually everything I do is in Dropbox. “Dropbox.com has a perfect mirror of my files and folder structure. ”Dropbox.com therefore has a perfect mirror of my files and folder structure. Perhaps Dropbox has an API that would let me browse its mirror of my local disk.


It does! With lots of documentation, almost none of which I understand! I’m sure it’s terrific for people who know what they’re doing, but won’t someone please think of the people who need a For Dummies book to read a For Dummies book?


What I’d like to do is to browse the file structure at Dropbox.com so I can identify the file I want to open, and have Dropbox.com tell me via its API what that file’s path is. Then I could use PHP or even JavaScript (I think) to directly open that file on my own disk via the Dropbox symbolic link in my PHP folder. Right?


Guess what the Dropbox.com API doesn’t tell you. Which is too bad because I want to use the same info later to save a file to a pathname inside that symbolic link.

But Dropbox does make it easy for you to drop a magic button into your HTML that will launch a Dropbox mini-file-browser. The version called the “Chooser” downloads files. The version called “Saver” uploads them. Just what I need.


Sort of. What I’d really like to do is:


  • Browse my Dropbox folders using the Chooser.

  • Click to download my chosen file.


  • Read the content of that file into my app, so I can display the outline contained within.



“As a matter of principle, I want to be able to have a user choose it, and read the contents programmatically. Thus did I lose, oh, two days of my life.”As a matter of principle, I want to be able to have a user choose it, and read the contents programmatically. Thus did I lose, oh, two days of my life.


I will not bore you with the many ways I tried to do this basic thing. I am sure that someone is going to read this and give me the one line of code I need. Instead, here is the complex way I managed to accomplish this basic task.

Register your app

First, you have to register your app with Dropbox in order to get a key that will let you access their API. This is relatively simple to do. Go to their App Console, and click on the “Create App” button. When asked, say you want to use the Dropbox API, not the Dropox for Business API, unless you have a business account with Dropbox. It will ask if you want to access a particular folder or all of the user’s folders. It will ask you to give your app a name; it has to be unique among all apps registered there.

On the next page, the fourth line is your app key. Copy it. Click on the “app secret” and copy it too. For OAuth redirect, use “localhost” if you’re hosting your app locally. Or put in the URL of where you’re hosting if it’s going to be out on the Web. Likewise for “Chooser/Saver domains.”

Now, into your HTML file place the line:

<script type=”text/javascript” src=”https://www.dropbox.com/static/api/2/dropins.js” id=”dropboxjs” data-app-key=”YOUR_KEY”></script>

Obviously, insert your Dropbox key (see above) appropriately.

Ok, let’s create the app.

The app


Into your HTML document create an empty div where the Dropbox button will go:

1

<div id=”DBbutton”></div>


In the header of your HTML document make sure you’ve included jQuery:

1

<script src=”https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js”></script>


Of course, if you prefer to download your own copy of jQuery instead of using Google’s, go ahead. But at this point so much of what I do goes through Google that avoiding using it for jQuery seems foolish.

Also in your header, after the jQuery line, place the following:

1

<script type=”text/javascript” src=”./js/Your_File_Name.js”></script>

Create a subfolder in the directory where your HTML file is and name it “js”. Using your favorite text editor create a file called whatever you want to call it, with a “js” extension. Obviously make sure that the file’s name is exactly the one in the line above. That .js file is where you’ll put your JavaScript…although in this example I’m including it all in the HTML file itself because all I’m going to do is going to occur in the script that loads immediately after the file loads. So never mind.

Here’s the rest of what should go into the head section of your HTML file.

1

<script type=”text/javascript”>

 

2

$(document).ready(function(){

3

var opts= {

4

success: function(files) {

5

var filename = files[0].link;

6

filename = filename.replace(“dl=0″,”dl=1”);

7

//alert(filename);

8

$.ajax({

9

url: “./php/downloadDropboxContents.php”,

10

data: “src=” + filename,

11

success: function(cont){

12

//alert(cont);

13

},

14

error: function(e){

15

alert(e.responseText);

16

}

17

});

18

},

19

extensions: [“.txt”,”.opml”],

20

multiselect: false,

21

linkType: “download”

22

};

23

var button = Dropbox.createChooseButton(opts);

24

document.getElementById(“DBbutton”).appendChild(button);

25

});

26

</script>


Line 2 is a very handy jQuery function that gets executed after the entire page has been downloaded into the browser. That means all the bits and pieces are there before the code in the function is executed.


In this case, the code is going to create a Dropbox button for the user to press. The options for that button are expressed in lines 2-22. Let’s start with the last lines.


Line 19 lists the extensions I want to let users (= me) select for download. There are only two: files that end with .txt and ones that end with .opml. OPML is the standard format for outlines. (Thank you, Dave Winer.)

Line 20 says that I don’t want users to be able to open more than one file at a time.


On line 21 we specify that we want Dropbox to give us back the downloaded file. The alternative is “preview,” which will provide a preview.


By the way, note that each option line ends with a comma, except for the last one. This whole option statement is actually a JSON set of key:value pairs, each delimited by a comma. In some cases, as in Dreaded Line 4, the values are multi-line and complex. Nevertheless, they’re still just values of the keyword to the left of the colon.


But I’m just putting off talking about the “success” option, lines 4-18, that set what happens if the download operation is successful.


Line 4 creates a function that will get passed an array of downloaded files, which unimaginatively I am capturing in the variable “files.”


Line 5 gets the link to the first file in the array. The array is files[]. The appended “.link” gets the URL to the Dropbox file, but it’s a funky link that, alas, doesn’t express the pathname, but some seemingly arbitrary set of characters. For example:

https://www.dropbox.com/s/smwhasdgztdiw5gbn/A%20History%20of%20the%20Philosophy%20of%20Time.%20txt?dl=0



If you were instead to say “files[0].name”, you’d get the file’s name (“A History of the Philosophy of Time.txt”). And if you say “.path” you — damn their eyes — get an error. Aargh. This could have been so much easier! Anyway.


“Line 6 is something I discovered on my own, i.e., I didn’t read the documentation carefully enough.”Line 6 is something I discovered on my own, i.e., I didn’t read the documentation carefully enough. Notice the “dl=0” at the end of the file link above. I’m going to guess the “dl” stands for “download.” If you leave it at 0, you get the user interface. But — aha! — if you replace it with 1, it downloads the actual damn file into your normal download folder, which defaults on the Mac to the Download folder. So, line 6 does the search and replace. (If line 7 weren’t commented out, it’d pop up the file link.)


So now we have a link that will download the file. Excellent!


Lines 8-17 use that URL to actually download it and read it. This requires (i.e., it’s the only way I know how to do it) executing a PHP script. For that we use AJAX, which JavaScript makes hard but jQuery makes easy.


Line 9 points to the PHP file. It lives in a folder called “php.” The “./” is redundant — it says “that folder is in the current directory” but I’m superstitious. We’ll write the PHP file soon.


Line 10 is the dumb way of saying what data we’re going to pass into the PHP script. We’re using the variable “src” and we’re passing the path to the downloadable Dropbox file. The better way to express this data would be to use JSON, but I never remember whether you put the key in quotes or not, so I’d rather do it this way (which in essence simply writes out the appendage to the basic PHP’s script URL) than look it up. But, I just did look it up, and, no, you don’t quote the keys. So line 10 should really be:

data: {src : filename},

but I’m too lazy to do that.

Now in line 11 we get to what we do with the results of the PHP script’s processing of the content it’s going to receive. The commented-out line would post the content into a dialogue box so you can confirm you got it, but what I really want to do is turn the content of that file into a outline displayed by my app. So, my real line 12 will be something like “displayOutline(cont)”, a function that I’ll stick elsewhere in my JavaScript. But that’s not what we’re here to talk about.


Lines 14-6 get invoked if the PHP fails. It displays a human-readable version of the error code. You’ll also want to be looking at your console’s error log. If you’re using MAMP, look at php_error.log, which you’ll find in MAMP/logs.


At line 23, we’re outside of the options declaration. Line 23 uses Dropbox to create a Chooser button that when pressed will pop up the Chooser with the right options set. “With luck, when you load it, you’ll see a Dropbox button sitting there.”


The button exists but not on your page. For that to happen, you need line 24 to tell your page to find the div with the id of “DBbutton” and to insert the button into it as a new last element. (Since there are no elements in that div, the button becomes its only element.)


All this happens before your page becomes visible. With luck, when you load it, you’ll see a Dropbox button sitting there.


Now onto the PHP.

The PHP

Create a folder named “php” in the same directory as your HTML file. In it create a file called “downloadDropboxContents.php”.


Here it is:

1

<?php

 

2

$src = $_REQUEST[‘src’]; // url

3

$filename = basename($src); // get the file’s name

4

error_log(“SRC; $src – FILENAME: $filename”);

5

$dir=”Downloads”; // set the folder to download into

6

// create the pathname for the downloaded file

7

$downloads = $dir . “/” . $filename; // md5($src);

8

// get the contents of the download — YAY!

9

$out = file_get_contents($src);

10

error_log($out);

11

// put the downloaded file there

12

file_put_contents($downloads, $out);

13

// repeat the contents out loud

14

echo $out;

 

15

?>


The comments should tell the story. But just in case:


Line 2 picks up the data we’ve passed into it. $src now should have the URL to the Dropbox file.


Line 3 gets the file name from the pathname. We’re going to need that when we save the file into our designated folder (which is “Downloads,” which you may recall, we created a symbolic link to in our php folder.)


Line 4 optionally writes the Dropbox URL and the filename into the console (see above), just to see where we’ve gone wrong this time.


Line 5 specifies what folder to put the downloaded file into. Remember that it has to be within the realm your web server counts as document root. Hence the symbolic link to Downloads in the php folder.


Line 7 creates the path name to that download folder by appending the file name to the path, with a “/” in between them.


Line 9 copies the actual damn contents of the downloaded file into a variable I’ve called “$out”. Line 10 checks the content. You probably want to comment that line out.“Line 14 reports the contents back to the “success” function…”


Line 12 writes the content into the download directory.


Line 14 reports those contents back to the “success” function in the JavaScript. It will there be captured by the variable “cont” in line 11.


That’s it. I know this is sub-optimal at best, and probably far more fragile than I think. But it works for now, at least with simple text files. And I couldn’t find anything at this level of beginnerness online.


I’m sorry.

1 Comment »

Next Page »