Data Sharing 101 – a brief introduction for everyone

Written by Spencer Gibson, PhD, Research Associate at the University of Leicester.

The DataSharing 101 site aims to be a launching pad for anyone interested in sharing data for biomedical purposes. The site started life as part of my work in Prof. Brookes’ research group at Leicester University, when working on the Genomic’s Clinic of the Future (GCoF) project. This was an E.U. horizon 2020 funded initiative to bring together scientists from different disciplines to investigate the various issues around sharing genomic data. My contribution to this project focused mainly on investigating current systems for sharing this type of information, and their applicability for sharing data between the clinical and research environments. The website became an avenue to disseminate this information to anyone who also wanted to explore this area further. As such the site aims to give some basic background with links out to other resources that either explore specific areas in greater detail or provide specific services or tools for data sharing.

As the GCoF project focused on data sharing in a biomedical/clinical context, it allowed me to focus on a small number of key groups. These were the patient, the clinician/healthcare professional, the data user and the data consumer. The data consumer and data user could also be a clinician or a patient but may also be a biomedical scientist, healthcare analyst or government official who may want to analyse health data for statistical, healthcare planning or medical research purposes.

The idea of medical data sharing for such purposes was the driving influence behind the GCoF project. However, as with all such data, there are issues with data security, data subject privacy and data misuse to consider. Further, the EU working party on sensitive data ruled that genetic data does classify as sensitive data and as such is subject to more stringent controls in relation to data protection than other types of personal data. The 1995 EU directive on data protection details how such sensitive data should be shared and secured, as well as the right of the data subject to know details about any data held on them, who has access to it as well as what it is being used for. The upcoming E.U. General Data Protection Regulation due to come into force in 2018, which the UK government has said it will enforce despite BREXIT, tightens these restrictions and introduces the need for the data subject to explicitly opt into any data sharing agreement after being fully informed.

It was this requirement of informed consent that influenced my decision to make the website not only a resource for scientists to find tools and databases that may be helpful in the use and sharing of genomic and biomedical data, but to write it using language that should be accessible to the non-scientist. During my research into pre-existing data sharing systems two things became apparent. The first was the biomedical data sharing systems seem broadly to be designed for two purposes, the first being sharing data between scientists and the second being the sharing of medical data between providers in countries that have a private healthcare system. The former seemed largely to give the scientist control over who had access to the data (after initial consent had been given by the data subject) and the second seemed to give control largely to the healthcare professional. Although there were some notable exemptions in the second category, such as the PCARE  system, where patients were able to control which healthcare professionals had access to their data, this did not see to be the standard approach. The second thing to become apparent was that there seemed to be some standard models for data sharing, which most of the systems that I discovered conformed to. While each of these had their benefits, not one of them seemed ideally suited to the data sharing scenario envisaged by the GCoF project. These base models and their benefits and flaws in relation to sharing data in the biomedical field are also discussed in the website.

While consent was covered by other members of the GCoF project and as such is not covered in detail in the website, it is worth considering that the context of medical data can change over time. As such, what was once appropriate to share may quickly become inappropriate. Genomic and genetic data have the potential to not only reveal the potential for health issues, but also subject identification where large amount of data (such as a personal genome, exome, or variome) have been taken from an individual. Currently re-identification would be a time consuming and costly affair that would almost certainly deter all but the most determined individual. This may not remain the case in the future though as our understanding of the genome, databases containing genomic profiles of individuals, computing power and algorithms increase and improve. Without suitable legislation to prevent it, re-identification will remain a possibility (even if unlikely at present). One such theoretical scenario was presented by Homer et al. in 2008, which cause a great degree of debate in the scientific community. However likely or unlikely the scienario presented, what it did illustrate was that once data has been released to third parties or the wider scientific community, it is unlikely to be analysed in isolation. This could lead to additional discoveries about the data being made. While this may not be as dramatic as the re-identification of the data subject, it could lead to unintended findings raising the moral dilemma of whether to inform the data subject. It may be possible to circumvent such issues with a more detailed form of consent, which indeed was another are of research in the GCoF project, but it is most unlikely that all possible findings could be foreseen at the time of consent being collected. Sharing of data and associated findings only compound this issue, making it almost impossible to predict the uses to which the data could be put. However, gaining a broad consent to data sharing is advocated by many funding bodies.

While it would be impossible for me to have produced a comprehensive resource within an easily accessible website, my aim was to give data subjects a general introduction to the issues they may face if they give consent for their data to be used and shared in the scientific community. I have sought to not only inform but also support potential data subjects by providing links to other resources, support groups and clinical trial finders. This latter part to the resource is something I have felt strongly about for several years, since losing my mother to pancreatic cancer in 2007. Understandably it was a traumatic time and while as a scientist I knew there may just be a chance with clinical trial, I did not know where to start to look for one.

It has been argued that where people have volunteered to participate in clinical research, there is a duty to use the data responsibly and get the fullest possible use from it (Harmon et al 2012). This however, should not be used as an excuse to grant the scientific community the right to use the data gained from such research in any way they see fit, as long as it can be justified within the realms of the original consent given. Life does not stand still and circumstances change. Besides which, one of the founding principles of modern medical research, is the right of a subject to withdraw from a study at any point without the risk of discrimination or retribution. While this may be difficult to implement for modern day long term studies, this is something we neglect at our peril. There are those who advocate a more interactive model of consent, the so called dynamic consent model, where the data subject does have the means of changing their consent for data sharing and usage over time. However, it should be noted that in practice this usually amounts to the data (or in some cases, tissue samples) being made unavailable for further scientific study, rather than the data being removed from any existing study that has used it.

While it may sound from this piece that I am not in favour of the sharing of biomedical data, this is not the case. It is my opinion that without such acts of selflessness by willing individuals, the full potential of genomics data will not be realised. For the goal of tailored medicine to become a reality, data sharing is an absolute requirement but this must be done in a transparent way which respects that the test subjects are real people with rights and feelings that must be respected.

Disclaimer: The views expressed in this blog post are my view only and may not necessarily reflect those of the University of Leicester or any of its other employees

Leave a Reply

Your email address will not be published. Required fields are marked *