Thursday, February 27, 2014
Using someone else's data
There was quite a lot of activity yesterday in response to PLOS ONE's announcement regarding its data policy. Most of the discussion I saw concerned rights of use and credit, completeness of data (e.g. the need for stimulus scripts for task-based fMRI) and ethics (e.g. the need to get subjects' consent to permit further distribution of their fMRI data beyond the original purpose). I am leaving all of these very important issues to others. Instead, I want to pose a couple of questions to the fMRI community specifically, because they concern data quality and data quality is what I spend almost all of my time dealing with, directly or indirectly. Here goes.
1. Under what circumstances would you agree to use someone else's data to test a hypothesis of your own?
Possible concerns: scanner field strength and manufacturer, scan parameters, operator experience, reputation of acquiring lab.
2. What form of quality control would you insist on before relying on someone else's data?
Possible QA measures: independent verification of a simple task such as a button press response encoded in the same data, realignment "motion parameters" below/within some prior limit, temporal SNR above some prior value.
If anyone has other questions related to data quality that I haven't covered with these two, please let me know and I'll update the post. Until then I'll leave you with a couple of loaded comments. I wouldn't trust anyone's data if I didn't know the scanner operator personally and I knew first-hand that they had excellent standard operating procedures, a.k.a. excellent experimental technique. Furthermore, I wouldn't trust realignment algorithm reports (so-called motion parameters) as a reliable proxy for data quality in the same way that chemicals have purity values, for instance. The use of single value decomposition - "My motion is less than 0.5 mm over the entire run!" - is especially nonsensical in my opinion, considering that the typical voxel resolution exceeds 2 mm on a side. Okay, discuss.
UPDATE 13:35 PST
Someone just alerted me to the issue of data format. Raw? Filtered? And what about custom file types? One might expect to get image domain data, perhaps limited to the magnitude images that 99.9% of folks use. So, a third question is this: What data format(s) would you consider (un)acceptable for sharing, and why?