Joho the Blog[2b2k] Melting points: a model for open data? - Joho the Blog

[2b2k] Melting points: a model for open data?

Jean-Claude Bradley at Useful Chemistry has announced (a few weeks ago) that the international chemical company Alfa Aesar has agreed to open source its melting point data. This is important not just because Alfa Aesar is one of the most important sources of that information. It also provides a model that could work outside of chemistry and science.

The data will be useful to the Open Notebook Science solubility project, and because Alfa has agreed to Open Data access, it can be useful far beyond that. In return, the Open Notebook folks cleaned up Alfa’s data, putting it into a clean database format, providing unique IDs (ChemSpiderIDs), and linking back to the Alfa Aesar catalog page.

Open Notebook then merged the cleaned-up data set with several others. The result was a set of 13,436 Open Data melting point values.

They then created a Web tool for exploring the merged dataset.

Why stop with melting points? Why stop with chemistry? Open data for, say, books could lead readers to libraries, publishers, bookstores, courses, other readers…

5 Responses to “[2b2k] Melting points: a model for open data?”

  1. This is interesting and probably good, but checking one compound left me with a strange feeling about the state of chemistry and the reporting of it.

    I thought I’d look at the web tool and I entered water, since it is the only substance for which I can give the melting point. If you enter it in the second line, the “SMARTS” box, you get an error box that there are over 500 results. When I gave limits as suggested, I got the same answer. I guess I’ll rely on high school science.

    I went to the drop-down list above and did not find water, but near where I thought it should be was “warfarin”, a medication I am aware of, as well as an ingredient in some rat poisons (“It’s a dessert topping. No, it’s a floor wax”-vintage Saturday Night Live). The values given were rather widespread, with the furthest out being from a crowdsourced database (value=197.13). The “government document” (a WHO report) value is cited as 156, but the actual document gives two different values, as ranges, in two places: 159-161 and 151-161. A “chemical vendor” site is reported to say 163, the average of what the site actually says, 162-164. The peer reviewed journals wanted $35 for 48 hours access, so I will leave that to others. The is no clear consensus on the melting point of this common” compound and one value that is very far from the others.

    So, in chemistry as in other science, check your sources.

  2. Open Data seems to uncover the disparity of available data, which may be even more important. Open it up and find the errors. Lovely!

  3. Yes, Skip and Andy. One of Jean-Claude’s mottos is that science should trust nothing. (Sorry, Jean-Claude for putting it badly.) In my book, I talk about the exercise he has his students do. He asks them to track back the sources for data reported as non-controversial in journals, e.g., the solubility of compounds. It turns out that the numbers are sometimes frighteningly wrong, and often impossible to trace back to their origins. That’s one important reason for “open notebook science”: we can see where the data actually came from.

  4. Andy W – thanks very much for the feedback – you just helped up curate the database and improve the explorer interface!

    1) Andrew Lang – who built the interface – now added a link to images in cases where the data source is not freely accessible. We’re taking small screenshots that should meet the criteria for fair use.

    2) In cases where there is freely accessible supplementary material (2 in this case) we have added those links. You should now be able to freely check the validity of each number for warfarin.

    3) It turns out that the 197 C outlier was a predicted value, not an experimental one. We thought we had filtered all these out but this one snuck through. The experimental value from that site was 161 C and this is now corrected.

    4) Water is not included because we are primarily concerned with organic compounds. But by definition – water melts at 0 C.

    5) The SMARTS substructure search box is not to be used with regular text entries – you would have to learn how SMILES strings are used to represent molecules. The drop-down or mp range fields, though, give you access to all compounds.

  5. This makes me happy.

    Having part of my life spent on (theoretical/computational) chemistry is one,
    however the more important is how much I like ONS idea and everything around opening science on the web.

    I just can’t live to see the open scientific papers. As for today, and despite Open Access initiative, only about 20% of articles produced are freely available….

    So it is a good step.

    However looking at the ONS solubility interface, I was not sure if the data is really exposed in a format that allows aggregation, free search and reuse. Upon first look, it seems to be entry point to some old-style standard database…

Web Joho only

Comments (RSS).  RSS icon