Notes on DATA SHARING from the BioSHaRe and HandsOn: Biobanks conferences
Two exciting conferences for those interested in data sharing
The first one, BioSHaRE, focused on the tools and services for data sharing developed by the EU project BioSHaRE. It is a collaboration between 14 institutions from Europe and Canada which aims to facilitate data harmonisation and standardisation, data sharing and pooling across multiple biobanks and databases. Their recent catalogue of tools encompasses the following areas: data description, presentation and search (Cafe Variome, OmicsConnect, Mica, MOLGENIS/Observ-EMX); data harmonisation across databases (BiobankConnect, DataSchema, EnviroSHaPER, Opal, SORTA, Vortex/Spa); data analysis across databases (DataSHIELD, ESPRESSO); contributor recognition (BRIF, ORCID); standardisation of sample handling (standards and recommendations); ethical, legal and social implications (ELSI guidances, ECOUTER).
The HandsOn: Biobanks conference attracted researchers from biobanks from Europe, Japan, and Canada. The diverse programme of this event covered multiple aspects of biobanking, such as sample collection and storage, IT infrastructure and interoperability, ethical and legal issues.
One of the current challenges for biobanks as well as for many other data resources is exposing their data to the interested community and making their data usable. This requires, among other things, smartly organised catalogues and communication between biobanks.
The last day of the conference featured a workshop on data sharing with contributions from Anthony Brookes, Gertjan van Ommen, Petr Holub, Enzo Medico, Luciano Milanesi, Nicolas Malservet, Niklas Blomberg, Jan-Eric Litton, and Kurt Zatloukal.
The following questions were discussed during this workshop:
- How do we define data sharing properly? What exactly do we understand by data sharing? Some data will be used by thousands researchers and some data – by 1-2 researchers in the world. Should we still strive to share everything? If not, how do we know what data will be useful and what will not?
- Because of the same motivation – to secure funding – biobanks and academic researchers sometimes pursue different aims: the former want to share and expose and the latter want to keep and protect their data until it is published. Clearly, more communication and clarity about what data, when and with whom can be shared, would help. Defining several different models of data sharing is certainly required.
- Very often making raw data available is not the most optimal solution (as it attracts much less attention and interest)
- How do we make data the most utilisable? Biobanks need to demonstrate that their data is used, because “showing that you can do something useful is the key to sustainability”.
Multiple data sharing initiatives
At DNAdigest, we are keeping an eye on data sharing initiatives and write about them on our blog. Some time ago, we noticed that there are other data sharing initiatives including, for example, biosharing.org. To our surprise, many members of the communities are not aware of each other. The idea of sharing implies that people understand what exactly is going on: what the benefits and dangers of data sharing are, what tools exist to assist with the process etc. People whom we asked were usually confused by the amount of different sources and the absence of clear information on the issues mentioned above.
The existence of different groups working independenly towards the same goal clearly means that the problem of data sharing is important and urgent. But, as it often happens, the solution will not be born quickly and easily. It means that there will still be years of uncertainty and searching for best practices in the data sharing field. DNAdigest will continue collecting and digesting information about different data sharing initiatives and strive to increase the amount of communication between the groups.
At the end of the HandsOn: Biobanks conference, a very wise philosophical notice was made by Gertjan van Ommen: “You cannot define the future in detail, because if you can it is no longer the future but the present”.
Let’s remember that technical discoveries that made the Internet possible were made in the 1960-ies, but it took many years to develop into the state that we know now. The Protein Data Bank (PDB) started in fact in the 1970-ies by a group of enthusiastic crystallographers but it was not until late 1990-ies that it actually started to be used by millions of researchers. So, let’s be patient and enjoy these exciting times when definitions and standards of data sharing are being developed!