• The importance of open data

    Afraid of prying eyes

    Should a researcher make his or her data available to the whole world? Secretly, nearly all scientists think they should. But in reality, they would rather keep their data to themselves.

    in short

    When a researcher requests the raw data from a scientific publication, they often run into problems. Their fellow scientists come up with excuses not to do it or only provide a very basic data set.

    Yet VSNU, research financier NWO and many publications require that data be made available to anyone who requests it. The RUG also mandates that raw data be saved in a central location.

    In reality, there is very little supervision to ensure that researchers actually do so.

    If data is not made available, replicating and authenticating the research can be extremely difficult.

    Rink Hoekstra is one of the founders of the Peer Reviewers Openness Initiative. Starting on 1 January 2017, the signatories will refuse to review articles that do not account for their data.

    ‘We hope that when enough people join the initiative, magazines will change their policies.’

    Opponents fear for the privacy of their research subjects, while others are concerned that their data could be used by others to score publications more easily or simply be stolen.

    Hoekstra doesn’t buy the criticism. Furthering science rather than one researchers’ ego should be the priority, he says.

    full version

    Reading time: 9 minutes (1,737 words)

    RUG psychologist Hedderik van Rijn thought the article in Nature Neuroscience was quite something. The researcher, who is quite well-known in his field, came up with a spectacular theory. Perhaps even a breakthrough. But was it really? Or could his findings be interpreted differently using a theory from 1890?

    Van Rijn did what any conscientious scientist is expected to do in a case such as this: he consulted with Niels Taatgen at the department of artificial intelligence. The two often publish work together. Together, they sent the article’s writer an email, politely asking if he could send him his data.

    Certainly, answered the colleague, whose name Van Rijn does not want to mention because this person might just be responsible for whether or not his next grant proposal gets approved. ‘But it took ages. We finally sent him another e-mail pointing out Nature’s policy,’ which states that authors should ‘promptly’ make their data available to readers without ‘undue qualifications’.

    Jumbled

    ‘We finally got the data set’, says Van Rijn, ‘but it was the most rudimentary data set you can imagine. It was so jumbled up that the sequence analysis we wanted to perform – to study the consequences of repeated task execution – became impossible. The data only conveyed the effect that supported the research.’

    ‘We discussed it’, says Taatgen. ‘We wondered, should just accept it or should we cause a fuss? We didn’t end up doing the latter, but it does leave a bad taste in your mouth. That’s a shame.’

    This is not an isolated case. After the Stapel affair Tilburg statistician Jelt Wicherts performed a test: he e-mailed dozens of psychologists asking for their data so he could re-analyse it for an article in Plos One. Only one out of every ten people responded favourably, and they turned out to be the ones who had published the most solid studies.

    Y drive

    Several years on, the situation has barely changed. Taatgen knows the excuses from personal experience. ‘They usually say they need some time to find it, and then they end up not giving it to you after all’, he says.

    RUG statistician Rink Hoekstra: ‘It’s striking how many people just got a new computer, or how many hard drives have just crashed when you ask for data’, he says.

    Not that things will be changing any time soon. Ever since the Stapel case, transparency and verifiability have been placed high on the science community’s agenda. The association of Dutch universities, VSNU, has the rule that raw research data has to be stored for ten years and should be made available to fellow researchers when they ask for it. Moreover, they need to be stored in such a way that they are accessible as quickly as possible.

    Elsewhere at the RUG

    Open data isn’t only an issue in the social sciences, of course. The PRO Initiative says everyone should make data available. A quick tour of other departments shows that they have objections as well.

    Ecologist Irene Tieleman has her doubts. ‘As an ecologist, I object, among other things, to the interpretation of numbers without an author knowing the context. In ecological studies, variation in the measuring environment is crucial. You can’t encapsulate that in a database’, she posits.

    RUG physicist Gerco Onderwater, who also works at CERN in Strasbourg in addition to his position here, has similar objections, even if ‘his’ institute is a paragon of openness. Anyone can log in to the Open Data portal and use the measurement results.

    ‘The general public doesn’t even possess the knowledge or processing power to use all that data’, he argues. ‘Data alone is not the answer. It’s also about expertise and how you interpret it. At the same time, there is always something there. Once you start searching, you can find whatever you want. It opens the door to confirmation bias.’

    Moreover, he has his doubts about the distrust implied by this forced openness. ‘It just really sounds like people are suspicious’, he says. ‘It’s as though people don’t trust their fellow scientists.’ That is probably not a good thing.

    UMCG researcher Marieke Wichers, who was recently awarded an ERC Consolidator Grant worth 1.5 million euros to gather data for research into depression, also remains unconvinced. ‘Obviously, openness is very good’, she says. ‘It improves the quality of the research.’ However, she fears a ‘wrong incentive’.

    When great data sets – such as the one she is about to compile – are that easily available, people will start using them to quickly publish. And that would actually diminish the quality of the science rather than improve it.

    In the RUG’s case, that means the raw data needs to be stored on the Y drive. But does that actually happen? Beau Oldenburg, who researched the sharing and storing of data in the sociology field in the Netherlands, discovered last year that programme directors often think that data is stored properly, which is quite different in practice.

    Brainstorm brigade

    The protocol for social sciences is clear, says Tom Postmes, president of a faculty commission on raw data storage. The researchers in sociology and psychology must centrally store their data and when they publish it, they need a ‘publication package’. PhD. candidates are not allowed to have their defence and students are not allowed to finish their master or bachelor thesis if they do not have everything in order. But what about senior researchers?

    ‘To be honest, we don’t know about them’, Postmes admits. ‘Every researcher knows about the protocol, and we’re working on a way to check whether they follow that.’ A quick check in the psychology and sociology departments yielded good results, but he has no hard data. They certainly will not be enforcing the issue in the future, but will instead form a ‘brainstorm brigade’. ‘Each data set has its own peculiarities that need to be handled with care. Together with the researchers, we want to figure out how to best approach this.’

    It is clear that they are on the right track toward improving the situation. For some people, however, things aren’t changing quickly enough. After a heated debate at a conference in Amsterdam last year, Rink Hoekstra decided to not wait any longer and founded a group with eight colleagues from across the globe: the Peer Reviewers Openness Initiative.

    Hoekstra and the initiative’s other signatories announced that from 1 January 2017 onward, they will refuse to review articles if they lack raw data without good reason. Any such articles that he receives will be returned to sender. He will only get to work once the data has been accounted for. ‘That makes it difficult for the magazine’, he realises. ‘But we hope that when enough people join the initiatives, magazines will change their policies. I want to change the default.’

    There are great advantages to a changing the culture. It would make it easier to supervise and replicate research. These are two of the core values of science. ‘We would immediately be able to see when people don’t keep their data properly up to date, or when they remove test subjects to make conclusions look better’, he says. ‘Open data ensures that mistakes can be rectified quickly.’

    ‘Stealing’ data

    He sees no reason not to practise openness, which is simply a matter of making something accessible to all rather than only reviewers or colleagues, or hiding it on a separate server. ‘Some people say they went to so much trouble to gather their data that they don’t want to give it away. That’s a personal argument. But I guess I’m just naïve in thinking that scientists should work towards the dissemination of knowledge rather than boosting their own egos and careers’, he posits.

    Oldenburg also encountered this fear in the PhD. candidates she interviewed about data storage. ‘While they indicate that they’re aware that it’s an un-scientific attitude, they think it’s unfair that someone else would ‘easily’ be able to use their data’, she writes.

    And then there is the fear of people ‘stealing’ data. Researchers are afraid that their data will be stolen from the repository by someone who will pretend they collected it themselves. Is that a realistic concern? ‘Data like that would quickly be recognised’, Hoekstra says. ‘And the drawback to publishing stolen data is a potential loss of face, and that would be fatal in the scientific community.’

    Fingerprint

    As such, he does not believe that the risks of openness are all that great. ‘You shouldn’t pollute a fundamental discussion with marginal phenomena.’

    And yet there are proponents who will not sign his initiative. Jacob Jolij has a more fundamental objection: his test subjects’ privacy. ‘We work with a lot of EEG scans, for example’, says Jolij. ‘But your brainwaves are as unique as a finger print. Would you want your finger prints on the Internet for everyone to see?’

    Jolij has absolutely no faith in the argument that such data should be made anonymous. ‘The algorithms used to analyse large data sets are getting better and better. We simply don’t know what we should remove in order to guarantee that it’s not traceable’, he warns.

    Moreover, test subjects may have signed up for a study, but they might not want their data used for anything else. ‘Say it’s a study into cognition, which also includes data on sex and race, and someone uses that data for research about intelligence and race? Some people might not want that.’

    Wall

    Open data? Fine, says Jolij. But store the data on a faculty server, where only interested parties have access to it. It might sound like a wall, but ‘I think my test subjects deserve a wall’.

    Van Rijn and Taatgen, for their part, found no objection, bearing their experiences with Nature Neuroscience in mind. Nine other RUG scientists followed their example. All in all, 270 people signed Hoekstra’s initiative. Is that enough?

    ‘It’s not a huge amount’, Hoekstra admits, ‘but it’s enough. Should editors encounter the phenomenon more often, it might actually get something moving.’ Besides, they only started the initiative several weeks ago. They have ten more months to recruit more supporters.

    And the privacy issue? ‘It’s almost as though people don’t read the small print in the PRO Initiative’, says Hoekstra. ‘The only thing we ask of them is to make the data available or give an explanation for why they won’t. Any explanation will suffice.’