New national data management strategy to bring desk-drawer data to light
Worldwide, data volumes are doubling every 9 to 12 months. Unfortunately, much of that data goes missing or is not put to use, explains Professor John Renner Hansen. He has chaired the committee that has developed Denmark’s new strategy for managing research data, which aims to make Danish research data findable, accessible, interoperable and reusable. “This is momentous,” says Senior Advisor Mareike Buss from CBS Library.
Astronomers have the daunting task of mapping the universe, including its one billion trillion stars – 1,000,000,000,000,000,000,000 – according to space.com.
“The universe contains so many stars that astronomers can’t possibly observe them all. That’s why, for the past 50 years, they have shared their data with each another,” explains John Renner Hansen.
A Professor at the University of Copenhagen and Chairman of the Danish e-Infrastructure Cooperation (DeiC), he has been steering the committee that has developed Denmark’s new strategy for research data management, which is now ready for implementation.
“Data volumes are snowballing. We expect a doubling in the global volume of data every nine to 12 months. However, much of that data disappears or is hidden away in desk drawers and not used. From an economic perspective, that’s a waste, so instead it must be findable, understandable and accessible,” he says.
The new strategy is based on four principles: FAIR, which stands for Findable, Accessible, Interoperable and Reusable.
Mareike Buss, Senior Advisor at CBS Library has been working with the FAIR principles since they were first published in 2016. Now the principles are at the core of a national strategy set to be implemented at all the Danish universities.
“This is momentous and important. If we want to change how we manage and produce data, we need to work differently. Some researchers may have always based their data practices on these principles, but for others it will entail a slight work around,” she says, explaining the prospects after implementation.
“If we can manage all future data on the basis of these principles, we can create an internet of FAIR data and services, a sort of research internet that integrates vast data sets and computing services. That’s powerful,” she says.
As open as possible – as closed as necessary
The strategy comes at a time when the EU and various national foundations have begun demanding that researchers produce and manage their data according to the FAIR principles if they want to apply for funding. They aim to ensure that the data sets produced are as widely reusable as possible, explains Mareike Buss.
“It is important to underline that this does not involve making the data open. Because data that contains sensitive personal information or the like cannot be open. We must also respect the GDPR here. So, we are working under the motto: as open as possible, as closed as necessary,” she says and continues:
“What’s important here is that all new data are findable.”
All research data, open or not, will come with a set of metadata that describes in detail what the data contains. Data about data, so to speak. For example, the metadata will say who produced the data, when it was produced, the method used and the population.
John Renner Hansen has talked to researchers from various fields, and some are concerned about the implications of producing and managing FAIR data.
“A lawyer who interviewed people for his research was worried that he would be required to make the interviews public, and then people would not want to participate in that type of research and interviews in the future. Therefore, researchers must feel secure about adopting the principles,” he says.
A national repository
Making the research data FAIR is one aspect, but preventing that the data disappears in cyberspace, is another aspect altogether.
Both Mareike Buss and John Renner Hansen explain that the eight universities, in collaboration with the Ministry of Higher Education and Science, are considering the option of a research data repository where researchers can easily publish, expose and find data.
“If we want to make research data FAIR, we also need an infrastructure to support that aim,” says Mareike Buss, who expects a national repository is on the cards for the near future, as creating local respositories would be too expensive for the individual universities.
Data volumes are snowballing. We expect a doubling in the global volume of data every nine to 12 monthsJohn Renner Hansen
With a national repository, Danish and international researchers can find and reuse much more easily, both when searching for new knowledge and previous research.
“Having access to all this data will make it much easier to run comparative studies across borders, for example. And in the field of climate research, so much more data will suddenly be comparable, which will strengthen the models and predictions,” says John Renner Hansen and continues:
“Another argument for making data accessible is the opportunity to check and control results. There’s security in knowing that you can reproduce statements based in the data available. When data is not available, you’re alone in convincing researchers that your results are accurate.”
An ecosystem of open data
According to Mareike Buss, the Ministry of Higher Education and Science has yet to announce what the implementation will imply. For example, whether sanctions will follow if the universities do not comply with the strategy.
“As I understand the situation, the strategy calls for and encourages researchers to practice the FAIR principles – and we do the same. At CBS, we help researchers and host courses on how to make data FAIR. But, of course, if the ministry produces a timeframe and demands a minimum requirement, we will be sure to meet it,” she says.
Looking ahead, John Renner Hansen hopes that a future national repository could be federated into EOSC, the European Open Science Cloud that will be “a multi-disciplinary environment where researchers can publish, find and re-use data, tools and services.”
“Individual researchers will work differently and think about their data in a new way. Meanwhile, we want wider access to all the data produced to make an ecosystem of open data. It’s a big vision that will take some time to get up and running,” he says.