Education, tips and tricks to help you conduct better fMRI experiments.
Sure, you can try to fix it during data processing, but you're usually better off fixing the acquisition!

Saturday, April 26, 2014

Sharing data: a better way to go?


On Tuesday I became involved in a discussion about data sharing with JB Poline and Matthew Brett. Two days later the issue came up again, this time on Twitter. In both discussions I heard a lot of frustration with the status quo, but I also heard aspirations for a data nirvana where everything is shared willingly and any data set is never more than a couple of clicks away. What was absent from the conversations, it seemed to me, were reasonable, practical ways to improve our lot.*  It got me thinking about the present ways we do business, and in particular where the incentives and the impediments can be found.

Now, it is undoubtedly the case that some scientists are more amenable to sharing than others. (Turns out scientists are humans first! Scary, but true.) Some scientists can be downright obdurate when faced with a request to make their data public. In response, a few folks in the pro-sharing camp have suggested that we lean on those who drag their feet, especially where individuals have previously agreed to share data as a condition of publishing in a particular journal; name and shame. It could work, but I'm not keen on this approach for a couple of reasons. Firstly, it makes the task personal which means it could mutate into outright war that extends far beyond the issue at hand and could have wide-ranging consequences for the combatants. Secondly, the number of targets is large, meaning that the process would be time-consuming.


Where might pressure be applied most productively?

Appealing to a scientist's best intentions is all well and good, but in my view it's easier to make a relatively small change to the rules of the game. My suggestion is to shift the burden from the individual scientist and onto the journal publishing the results. The scientific publication industry is changing all the time, so the fact that there is a move towards more transparency and sharing of data is just another in a long litany of changes the journals are experiencing.

The journals are, however, uniquely placed to change policies regarding data sharing in particular. If a journal makes as a condition of publication that you first upload the data on which your manuscript is based, guess what? That is precisely what you will do. Why do I know this? Because you already comply with their instructions in manifold other ways. You use the font they want, you make the figures the size they want, you use the reference format they want, and you even relegate the methods into supplemental online material even though you know you shouldn't because you wouldn't read your own paper sans experimental details. What's more, at the point of submitting a manuscript you are laser-focused on your goal and best prepared to execute the task of data sharing as just another step on the path.


A call to action?

If we are seriously bothered by data sharing and want to change the way it's done then the first step, it seems to me, is to create a list of those journals publishing neuroimaging studies and categorize them based on their data sharing policies. Next, I am suggesting that those people who have strong opinions on sharing of data should walk the walk, and publish only in those journals whose processes match their stated opinions.**  This is the market at work. If journals stop receiving good manuscripts because the good scientists have gone elsewhere, they will change their practices.

I think we can use three basic categories for journals' policies on data sharing:
  • In the top category are those journals who mandate data sharing as a condition of publishing your study. No data upload, no publication. This is our star team, the journals we should all be using (if we care about data sharing as a precondition for doing science).
  • In the middle category are all the prevaricators. This is the space the vast majority of journals inhabit. They tell you that you must share your data if you are asked to, and this is a Very Serious Policy. So serious, in fact, that they will do, errr, absolutely nothing if you fail to comply. These journals have neatly deflected the task of sharing back onto you, the individual scientist. Why? Perhaps because they are afraid they will see fewer submissions if they get aggressive with data sharing? Or perhaps they are afraid they will have to put up resources to facilitate the sharing, and that would eat into their precious profit margins. But if the sharing of data is a cost of doing business in scientific publishing then it is their cost to bear.
  • The bottom group of journals hardly needs introduction. In this group is any journal saying Not Our Job. They don't even insist that you offer your data when you publish your manuscript. It's all up to you, dear scientist. 


Your field needs You!

Here's where you come in. I would like to crowd-source a review of the journals publishing neuroimaging studies. All I need is for someone to think of a journal, head to the instructions for authors, find the data sharing policy blurb and send me a link to it. That's it! I will then categorize the journals as above, and I'll put out a blog post as a quick guide for scientists looking for sharing-compliant journals to publish in. Pretty easy, huh?

__________________





* I should state for the record that I don't have strong opinions on whether all data should be shared, whether all published data should be shared, when data should be shared, if and how credit should be given, whether there should be restrictions on who can use shared data, etc. I am neither an advocate for nor an opponent of data sharing. My job is to facilitate data generation by others, and to solve problems arising. Data sharing has been stated to be a problem for some in my community, please take this blog post as my contribution to solving the stated problem.

** I'll note here my feelings about open access, which are considerably less ambiguous than my opinions on data sharing. I now refuse to review for journals who don't offer open access. If you review for a journal that erects pay walls and you object to pay walls then I'm very sorry to inform you, you are part of the problem. If you're an editor for a journal with pay walls then you have a very large amount of explaining to do, in my opinion.



Addendum - 28th April, 2014.

Further Reading:

Human neuroimaging as a "Big Data" science.
PMID: 24113873

Toward open sharing of task-based fMRI data - the OpenfMRI project.
PMID: 23847528

Why share data? Lessons learned from the fMRIDC.
PMID: 23160115

Making data sharing work: the FCP/INDI experience.
PMID: 23123682

Data sharing in neuroimaging research.
PMID: 22493576

9 comments:

  1. Thanks for sharing these comments with the community.
    In my opinion, data sharing should be one of the pilars for reproducible research, regardless the act of sharing is done by the scientist responsible for the data or by the journal that publishes the results. But this is not the only point... My personal view is that we must start sharing scripts and codes for data analysis. To a certain extent, a published work must be replicated by others with data acquired in an equivalent sample population (of course, if that's possible and sensible). Often, it is very difficult to find all necessary details of the analysis in the description of methods (sometimes as Supplementary Info) or some part of the analysis or algorithm was manually programmed in Matlab, python, R, etc.... We are well aware of the fact that even slight modifications in the analysis pipeline could dramatically change the results. I should also take the blame for this. As an signal processing guy and fMRI methodologist, I am working right now on making all my algorithms available for the community (With hindsight I should have done that when I published my works). I often see reproducible research in the signal processing community and it makes the field advance quickly and rigurously since new methods are always benchmarked against the state of the art.
    Of course, this is just my opinion.

    Finally, a suggestion, why don't you create the blog with some classic journals and we can complete?

    ReplyDelete
  2. Hi Cesar, yes, you're correct, the issue goes far beyond data in neuroimaging research especially. Hence the big push amongst some in the field to develop iPython and other approaches to scientific computing.

    A factor that arises whenever sharing is raised is the additional burden on the data/code generator to make the product usable by others. In general it means slowing down, e.g. to add comments to code, to validate code, to capture meta-information on the data, etc. and this isn't something scientists are good at. It is largely for this reason that I think the mandated processes must be changed, and changed at points where the scientist has a vested interest in making progress. I think that means publications and grants. Given that grants are often regional or national, whereas publications are generally international, the fairest way to proceed is to change the publications processes, it seems to me.

    Regarding blame, yup, mia culpa! We are all a part of the problem to this point! The question is how we want to proceed into a future that we will shape, either by action or by inaction. Rather like the old aphorism on democracy, we get the publications and incentives processes we deserve. Nobody is foisting them on us. (The grants system is a noteworthy exception!)

    And finally, your suggestion. Yes, I shall be starting off with a review of the policies of NeuroImage because that is the journal I encounter most often with fMRI studies. Look for a comment including links very soon...

    Cheers!

    ReplyDelete
  3. NeuroImage:

    Doesn't seem to have a data sharing policy: http://www.elsevier.com/journals/neuroimage/1053-8119/guide-for-authors

    But it does encourage database linking:

    Database linking

    Elsevier encourages authors to connect articles with external databases, giving their readers one-click access to relevant databases that help to build a better understanding of the described research. Please refer to relevant database identifiers using the following format in your article: Database: xxxx (e.g., TAIR: AT1G01020; CCDC: 734053; PDB: 1XFN). See http://www.elsevier.com/databaselinking for more information and a full list of supported databases.

    I also see Elsevier has a Research Data Services group: http://researchdata.elsevier.com/about So perhaps in the not-too-distant-future, NeuroImage (Elsevier) can mandate data sharing and then offer a solution (for a fee!) should you not want to use another approved database.

    ReplyDelete
  4. Elsevier and some other publishers are listening at least:

    "Should research data be publicly available?"

    http://www.elsevier.com/connect/should-research-data-be-publicly-available

    Report from a meeting in Oxford in May, 2013.

    ReplyDelete
  5. SCIENCE Magazine:

    Has a Very Serious Policy regarding making all code and data:

    http://www.sciencemag.org/site/feature/contribinfo/prep/gen_info.xhtml#dataavail

    Only one snag:

    "After publication, all reasonable requests for data and materials must be fulfilled."

    After publication. So close! The snag is that there are no consequences spelled out for failing to fulfill all reasonable requests for data and materials. It's also not ideal from the authors' perspective, unless they've taken the expedient of ensuring that everything is already on the cusp of being public as the paper comes out. I know I wouldn't want to face dozens of independent requests just days after publication.

    However, SCIENCE does offer an opinion on how to share:

    "Large data sets with no appropriate approved repository must be housed as supplementary materials at Science, or only when this is not possible, on an archived institutional Web site, provided a copy of the data is held in escrow at Science to ensure availability to readers."

    Short of holding publication prior to the data and materials being uploaded in anticipation of sharing requests, or threatening to retract a paper if the authors failure to comply with "all reasonable requests" for sharing, the policy is Serious.

    ReplyDelete
  6. For those interested in reasons not to share neuroimaging data, one of the big problems is privacy: http://journal.frontiersin.org/Journal/10.3389/fninf.2014.00035/abstract

    ReplyDelete
  7. Thanks for this post, data sharing is very relevant to conducting open science and I think gathering the data sharing policies from different publishers will provide a nice overview of what is currently expected. I get the sense that the neuroimaging field is moving (slowly) to be more in line with other disciplines where data sharing is a crucial part of research workflow.

    Genomics and synthetic biology are great examples where the community exhibited a shift in values towards data sharing through the use of curated data repositories (e.g., http://www.geneontology.org/) and standards (e.g., http://www.sbolstandard.org/). Most neuroimaging data sharing efforts seem to be centered around large initiatives, such as HCP, ADNI, INDI, NDAR, where data sharing is part of the mission and those collecting the data are also the ones providing the database infrastructure. However, the data distributed by each of these data sharing efforts are organized in a proprietary way with distinct directory structures and naming conventions, which seems to make it unclear how a small lab can best make their data available - whether it is required by NIH, Publishers, or otherwise.

    As data sharing requirements evolve, it would be great to see clear guidelines for what needs to be reported (e.g., http://mibbi.sourceforge.net/projects/MIfMRI.shtml), as well as how to capture that information in a structured way and make it available as part of the publication process. As that "Data sharing in 3 short acts" cartoon so excellently conveys - data sharing and meaningful data sharing are two very different things...

    After that digression, a couple contributions for the task at hand:

    Springer (Neuroinformatics)
    Note that they accept a "Data Original Article" under "Types of Papers", so others can cite your data just like a regular manuscript, but didn't see any data sharing requirement, just info in the "Information Sharing Statement"
    http://www.springer.com/biomed/neuroscience/journal/12021?detailsPage=pltci_1742107

    Frontiers (Neuroinformatics, Brain Imaging Methods):
    http://www.frontiersin.org/Brain_Imaging_Methods/authorguidelines#DataSharing

    Human Brain Mapping:
    "...encouraged (but not required) to submit their data to the Human Brain Mapping Database, or BrainMap (www.brainmapdbj.org)" - note that the link is broken/expired =)
    http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1097-0193/homepage/ForAuthors.html

    Plos One:
    Seems to take the issue seriously, and points to data repos like Dryad (http://datadryad.org/)
    http://www.plosone.org/static/policies#sharing

    ReplyDelete
    Replies
    1. Thanks, Nolan!

      "Most neuroimaging data sharing efforts seem to be centered around large initiatives..."

      Quite so. I think the OpenfMRI.org project is setting a good example. My concern with these initiatives, as pointed out by Jack Van Horn and Mike Gazzaniga (PMID: 23160115), is that they are dependent on external funding. Imagine if everyone woke up tomorrow and decided to share everything. Where would it all go? Success is possibly the biggest threat to sharing!

      The other day I Tweeted this, only slightly facetiously:

      "My $5 says data sharing will become norm once it's a commercial product owned by publishers:

      http://libraryconnect.elsevier.com/articles/best-practices/2013-02/research-data-driving-new-services

      http://researchdata.elsevier.com/about "


      But I do wonder why we don't let commercial entities provide solutions, so long as the solutions are responsive to the needs of the consensus position. We really only have three options, don't we? We either do the sharing ourselves, or we get grant funding to set up large initiatives as already mentioned, or we pay someone else to do it. We pay journals (indirectly, through library subs mostly) to publish our work. I pay Siemens to provide me with my research instrumentation. Why not let companies exploit the value in data sharing? I have often wondered about the intrinsic value of an hour's worth of fMRI data that cost something like $1000 to obtain.


      "As data sharing requirements evolve, it would be great to see clear guidelines for what needs to be reported (e.g., http://mibbi.sourceforge.net/projects/MIfMRI.shtml), as well as how to capture that information in a structured way and make it available as part of the publication process."

      I am trying to keep an up-to-date version of the acquisition parameter reporting on this blog: http://practicalfmri.blogspot.com/2013/01/a-checklist-for-fmri-acquisition.html

      I'll update it as often as I can. I have plans to add MB/SMS options soon. And I take suggestions!

      But my parting thought for today regarding sharing neuroimaging data is the one that is now concerning me: privacy and consent. I just re-read Russ Poldrack's paper (PMID: 23847528). In section 3. Confidentiality, they write:

      "Because high-resolution structural images may contain information about facial structures, all structural images will have facial features removed prior to sharing."

      But in the conclusions is this:

      "Preliminary analyses of the database have confirmed the ability to classify mental states across individuals, as well as demonstrating the novel ability to classify the identity of individual subjects from their fMRI patterns."

      So whoever is figuring out the machinery of sharing will need to be looking at the privacy and consent issues very, very carefully. Luckily, we have other fields to draw upon, such as the use of meta information from Internet/cellphone use, and genetics.

      Cheers!

      Delete
  8. Sometimes the processing of properly anonymizing data is burdensome; and some IRBs are particularly strict about this, e.g. the Veterans Administration. A blanket requirement may very well be counterproductive.

    ReplyDelete