I’m supposed to be taking some of the summer off, finishing the book and a couple articles, but like Michael Corleone in The Godfather, every time I think I’m out they pull me back in. I was at the first day of SciBarCamp today, playing local host / fixer / keeping an eye on the furniture. Sean Mooney (who in addition to being a former professor at Indiana University, was a World Wrestling Federation announcer) gave a very interesting talk about current challenges in bioinformatics.
A fair amount of Sean’s talk dealt with the technical challenges of creating federated databases, the differing demands of bench scientists and funders— the former want tools for managing and analyzing data in today’s problems, while the latter want to attack Big Questions— and the issues involved in getting people to share their data. The issues aren’t so much philosophical or competitive, but practical: people believe in sharing data, and once they’re done with it are generally willing to share so long as it doesn’t put a burden on them.
But as Sean was talking about how different labs used different procedures for similar experiments, and how those differences manifested themselves in the ways they produced and consumed data (at least, this is what I took away from his talk— he might have meant something complete different), a thought came to me. Projects intended to let scientists assume that data can be converted into something like the reagents or instruments labs buy from suppliers— a commodity that you don’t have to think about, you just use. But what if data can’t be black-boxed this way? Or, more specifically, what if only really uninteresting data— the kind that everyone understands very well, the kind that’s solidly in the realm of normal science— can be cleaned up, repackaged, commodified and standardized, and put online into generally-usable databases?
On one hand, this idea might seem stupid. After all, science is science: data is data, and facts about nature are true no matter where they’re created. That makes them scientific. On the other hand, if you buy the argument of people like Harry Collins, scientific research is as much a craft as a— well, a science. Databases tend to reflect the specific, local interests of researchers, working on particular problems. This tends to work against the generalizability of data: the more it’s a product of craft, and an object tailored to a particular job, the harder it’ll be to make it useful to other people.
So depending on how much databases are expressions of craftwork and problem-solving and bricolage, and how much they reflect a timeless, placeless crystallization of nature’s order, they’re going to be less or more easily poured into big projects to reuse data.