On Tuesday I became involved in a discussion about data sharing with JB Poline and Matthew Brett. Two days later the issue came up again, this time on Twitter. In both discussions I heard a lot of frustration with the status quo, but I also heard aspirations for a data nirvana where everything is shared willingly and any data set is never more than a couple of clicks away. What was absent from the conversations, it seemed to me, were reasonable, practical ways to improve our lot.* It got me thinking about the present ways we do business, and in particular where the incentives and the impediments can be found.
Now, it is undoubtedly the case that some scientists are more amenable to sharing than others. (Turns out scientists are humans first! Scary, but true.) Some scientists can be downright obdurate when faced with a request to make their data public. In response, a few folks in the pro-sharing camp have suggested that we lean on those who drag their feet, especially where individuals have previously agreed to share data as a condition of publishing in a particular journal; name and shame. It could work, but I'm not keen on this approach for a couple of reasons. Firstly, it makes the task personal which means it could mutate into outright war that extends far beyond the issue at hand and could have wide-ranging consequences for the combatants. Secondly, the number of targets is large, meaning that the process would be time-consuming.
Where might pressure be applied most productively?
Appealing to a scientist's best intentions is all well and good, but in my view it's easier to make a relatively small change to the rules of the game. My suggestion is to shift the burden from the individual scientist and onto the journal publishing the results. The scientific publication industry is changing all the time, so the fact that there is a move towards more transparency and sharing of data is just another in a long litany of changes the journals are experiencing.
The journals are, however, uniquely placed to change policies regarding data sharing in particular. If a journal makes as a condition of publication that you first upload the data on which your manuscript is based, guess what? That is precisely what you will do. Why do I know this? Because you already comply with their instructions in manifold other ways. You use the font they want, you make the figures the size they want, you use the reference format they want, and you even relegate the methods into supplemental online material even though you know you shouldn't because you wouldn't read your own paper sans experimental details. What's more, at the point of submitting a manuscript you are laser-focused on your goal and best prepared to execute the task of data sharing as just another step on the path.
A call to action?
If we are seriously bothered by data sharing and want to change the way it's done then the first step, it seems to me, is to create a list of those journals publishing neuroimaging studies and categorize them based on their data sharing policies. Next, I am suggesting that those people who have strong opinions on sharing of data should walk the walk, and publish only in those journals whose processes match their stated opinions.** This is the market at work. If journals stop receiving good manuscripts because the good scientists have gone elsewhere, they will change their practices.
I think we can use three basic categories for journals' policies on data sharing:
- In the top category are those journals who mandate data sharing as a condition of publishing your study. No data upload, no publication. This is our star team, the journals we should all be using (if we care about data sharing as a precondition for doing science).
- In the middle category are all the prevaricators. This is the space the vast majority of journals inhabit. They tell you that you must share your data if you are asked to, and this is a Very Serious Policy. So serious, in fact, that they will do, errr, absolutely nothing if you fail to comply. These journals have neatly deflected the task of sharing back onto you, the individual scientist. Why? Perhaps because they are afraid they will see fewer submissions if they get aggressive with data sharing? Or perhaps they are afraid they will have to put up resources to facilitate the sharing, and that would eat into their precious profit margins. But if the sharing of data is a cost of doing business in scientific publishing then it is their cost to bear.
- The bottom group of journals hardly needs introduction. In this group is any journal saying Not Our Job. They don't even insist that you offer your data when you publish your manuscript. It's all up to you, dear scientist.
Your field needs You!
Here's where you come in. I would like to crowd-source a review of the journals publishing neuroimaging studies. All I need is for someone to think of a journal, head to the instructions for authors, find the data sharing policy blurb and send me a link to it. That's it! I will then categorize the journals as above, and I'll put out a blog post as a quick guide for scientists looking for sharing-compliant journals to publish in. Pretty easy, huh?
* I should state for the record that I don't have strong opinions on whether all data should be shared, whether all published data should be shared, when data should be shared, if and how credit should be given, whether there should be restrictions on who can use shared data, etc. I am neither an advocate for nor an opponent of data sharing. My job is to facilitate data generation by others, and to solve problems arising. Data sharing has been stated to be a problem for some in my community, please take this blog post as my contribution to solving the stated problem.
** I'll note here my feelings about open access, which are considerably less ambiguous than my opinions on data sharing. I now refuse to review for journals who don't offer open access. If you review for a journal that erects pay walls and you object to pay walls then I'm very sorry to inform you, you are part of the problem. If you're an editor for a journal with pay walls then you have a very large amount of explaining to do, in my opinion.
Addendum - 28th April, 2014.
Human neuroimaging as a "Big Data" science.
Toward open sharing of task-based fMRI data - the OpenfMRI project.
Why share data? Lessons learned from the fMRIDC.
Making data sharing work: the FCP/INDI experience.
Data sharing in neuroimaging research.