DANS – Data Archiving and Networked Services (The Netherlands)

Peter Doorn – DANS Director

(interviewer: Marta Hoffman-Sommer)

DANS – Data Archiving and Networked Services – is a national institution serving all Dutch researchers. What services does it offer to the research community?

DANS promotes sustained access to digital research data. For this, DANS encourages scientific researchers to archive and reuse data in a sustained form, for instance via the online archiving system EASY and the Dutch Dataverse Network. With NARCIS, DANS also provides access to thousands of scientific datasets, e-publications and other research information in the Netherlands. The institute furthermore provides training and consultancy and carries out research on sustained access to digital information. Driven by data, DANS with its services and participation in (inter)national projects and networks ensures the further improvement of access to digital research data.

DANS is an institute of the Royal Netherlands Academy of Arts and Sciences (KNAW) and the Netherlands Organization for Scientific Research (NWO). Historically, who has inspired the launching of DANS? Was it an initiative of the research community? What needs have led to the founding of a national institution for data archiving?

The first predecessor of DANS was founded in 1964: the Steinmetz Archive (originally: Foundation) for the social sciences. That was an initiative of the research community, viz. of social scientists engaged in survey research. Also the Netherlands Historical Data Archive (founded 1989) and the e-Depot for Netherlands Archaeology (EDNA), which are now part of DANS, were inspired by research communities (respectively, historians and archaeologists).
In all cases, those communities realized that the digital datasets they were compiling were in need of preservation and could be re-used by others.

Have you succeeded in achieving the initial goals set at the founding stage of DANS?

DANS was created in 2005 and it clearly serves its purpose. The amounts of archived data are growing exponentially, and the re-use also increases rapidly. Nevertheless, some research communities are much more aware of the use of sharing data than others. In Archaeology, the State Archaeological Service was important in making the deposit of data obligatory. In an area such as psychology, only a small proportion of the data is archived at DANS.

In recent years we see a growing awareness among universities, other research organizations and funders that data need to be well-managed. DANS plays an advisory role for these organizations, and also delivers services supporting data management.

Chart 1. Datasets at DANS sccording to year

You just mentioned that data re-use is growing rapidly. How do you measure this?

We register every download of a dataset. A plot of the downloads per year displays a clear growth. Taking into consideration that the data are of a specialist nature and that the number of researchers in our country is limited, I consider the usage statistics quite good. On average every dataset is downloaded once yearly, although an 80-20 rule applies: 20% of the datasets is responsible for 80% of the downloads. A successful dataset is downloaded up to a hundred times or more.

Chart 2. Reuse of datasets from DANS archive 2005–2014

You said the State Archaeological Service has made data deposit obligatory. Does this data have to be made open for re-use? And are there any other institutions in the Netherlands that have made data sharing obligatory?

The data is not always open to everyone. The archaeological data can contain detailed site descriptions, which might attract hobby-archaeologists or treasure hunters with metal detectors to heritage that needs to be protected. Hence about 25% of the datasets is only open for professional archaeologists. Many social science datasets contain information on individuals, which can also not be shared openly, in this case for privacy reasons. 11% of our data fall in this protected category. 63% of the datasets is openly accessible, most of them require just registration. For the purists this is not yet full open access. Therefore we are now implementing open access without registration under a creative commons “CC0” licence.

What – in your experience – are the advantages and disadvantages of operating on the national (vs institutional) level, from the viewpoints of the data provider, the data user, and the data archiving institution?

I do not want to claim that any one solution, either central or decentralized, is the best solution; we are promoting a federated data infrastructure, with roles and responsibilities for different players at different levels. This federated approach is based on the idea of the “collaborative” model described in the report “Riding the Wave”, which was published about five years ago. DANS tends to concentrate more on back-office functions, supporting front-offices at the universities, which serve the researchers at the home institution. We published a brochure on this which explains the front-office back-office model in more detail.

DANS runs a national research data repository, EASY. Are there many institutional or thematic data repositories in the Netherlands and what is your concept of task division between these three types of repositories: institutional, thematic, national?

All Dutch universities maintain an institutional repository, oriented towards publications. DANS harvests these repositories and aggregates the information via the Narcis.nl portal. There are also several thematic repositories, such as The Language Archive at the Max Planck Institute for Psycholinguistics in Nijmegen. DataverseNL, hosted by DANS, is currently used by eight universities to store data during the research process and as an archive for the short and midterm. The three technical universities run a joint 3TU.datacentre, with which DANS has a close collaboration. On top of these, there are also international data repositories, both “universal” (such as Figshare and Zenodo) and disciplinary (for a couple of domains).

Outside of science and scholarship, there are also data repositories that are of great relevance to researchers, for instance those of the national archives, the national library, the broadcasting archives and those of the cultural sector. With these we set up the National Coalition for Digital Preservation (NCDD). And DANS is the access portal for researchers using data from Statistics Netherlands (CBS).

The overall landscape is very much fragmented. By creating coalitions with the most important national players, such as in the Research Data Netherlands (RDNL) and with the NCDD mentioned above, we aim to create a certain coordination. Internationally, DANS is member of a substantial number of research data organizations (RDA, CoData, APA) and research infrastructures (DARIAH, CLARIN, CESSDA, etc).

What actions does DANS undertake to promote data sharing among researchers? Which of these actions do you consider particularly successful?

The most effective strategy is to promote this with the research funders, both national (NWO in the Dutch case) and European-wide (DG Research, DG Connect). Additionally, we give many presentations, provide brochures, publications and other information materials, run a newsletter, etc. The case of the archaeologists is also illustrative: here it was the State Service of Archaeology that we managed to convince of the importance of making the deposit of digital data obligatory.

Which aspects of research data sharing do you think pose currently the biggest challenges: are these technical difficulties related to data curation and preservation, legal issues, the attitude of the research community, or something else?

Personally I think that the human factor is the most complicated one, and to win the hearts and minds of the researcher to share his or her data is the greatest challenge. It is important to realize that the (perceived) rewards for data sharing for the researcher who wants to make data available are minimal (although there are indications that researchers who share data get cited more often). It is understandable that they see it as a burden rather than as a benefit.

The second challenge is one of organization: how do we get coordination in the world-wide data system? Organizations such as the RDA have this as their prime objective, but the process is as confusing as it is promising. And of course there are technical challenges as well. These have to do both with the data volumes we are producing, but perhaps even more with selection and curation. What do we want to preserve? For how long? To whom has it value? How should the data be accessible? How do we find it back? What documentation or metadata is needed to understand the contents? It makes no sense to try to preserve everything. But not being alert to the issues of data sharing means that we will inadvertently lose or destroy a lot of scientific knowledge.

Thank you.

