Write for Data Sharing Essay Competition: Winning Entry by Eilish Wells

Data sharing has become the topic of heavy debate, around ethical, legal and funding issues; particularly concerning the sharing of patient records from the National Health Service (NHS). Data Sharing is a concept that has received noticeable support from many sources and it is often considered a crucial tool to further scientific understanding in every field. It has even been described as the ‘fourth paradigm: data intensive scientific discovery’ (1) and, if correctly used, has the potential to unlock many questions that have so far eluded researchers.

January 2013 saw the collaboration of seventy organisations to form the Global Alliance for Genetics and Health (GAGH). The alliance now has 148 members and is an international non-profit organisation with an aim to ‘tackle the challenges of genomic and clinical data sharing’ and ‘to make it possible to share and interpret this wealth of information’. (2) GAGH was initially driven by the fall in the cost to sequence an entire genome (3); it is becoming an everyday practice in a laboratory environment and, as a consequence, there is an ever growing mass of genetic information. GAGH aims to preserve this information and produce a network of cloud computing platforms to provide access to this shared data (4).

Data Sharing has continually been recognised as important, not only for the advancement of scientific knowledge, but also for the preservation of information: safeguarding against misconduct and verification of conclusions (1). Tenopir et al carried out a survey in 2011 that explored the practices and perceptions of data sharing. The aim was to investigate the restrictions around sharing and why, despite clear support for long term data preservation, 67% agreed that a lack of data sharing was a clear impediment to progress in science (1). The key reasons found were lack of time (53.6%) and lack of funding (39.6%). Overall the paper suggested that a clear provision of infrastructure, policies and best practices would enhance the application of data sharing. Data Observation Network for Earth (DataONE) aims to provide that coordinated access to current data collections through the provision of a cyber-infrastructure that allows easy, clear and free data collection (1).

The same survey saw 98% of participants vote that publically funded research data is public property and therefore should be adequately stored. The question of ownership, and therefore control, of data has been central to the ongoing debate surrounding data sharing. Journals such as Nature require that any publication requires authors to make materials, data and associated protocols promptly available. Researchers have valid concerns that publishing data could lead to people making use of their work without it being properly recognised or cited (1).  Citations, and the recognition of work, are key; and this further extends to the control of intellectual property rights and the system of global innovation.

The control of data goes even further the scientific community: the National Health Service has launched a programme called Care.data (5) for the collation of patient information through the Health and Social Care Information Centre (HSCIC), with the idea that ‘Better information means better care’ (6).  It is true that this database could be a highly powerful tool; the greatest asset being the sheer volume of data. Professor Peter Johnson from Cancer Research UK correctly stated that ‘Everyone in England has a decision to make’ (7) because there is an ‘opt out’ option for every patient. The offer of a personal choice is ethically the correct thing to do; personal data should be considered to be under the control of the patient and safeguards should be implemented. An ‘Opt in’ choice may be a better ethical method to pursue but could ultimately jeopardise the overall participation; effecting statistical significance. An awareness of the fundamental need for wide participation would be integral to any ‘opt-in’ scheme. Care.data is currently undergoing a six month delay (8) because of significant privacy concerns. However HSCIC have adopted the Information Commissioners Office’s code towards the treatment of personal information and anonymity (5) but are charging externally approved organisations for access to the data (5). Polls have also shown that only one third of the population were even aware that the Care.data scheme was being implemented; let alone the option to withdraw participation (9).

Data Sharing is a vital method required to continue development in science; and a well implemented infrastructure and privacy policy is key to its success. It is important to consider that there should be flexibility when deciding appropriate availability of the data: this may indeed vary according to factors such as the field of research and the parties involved. Ownership is key: public data should be made available to the public and, at the same time, protections associated with individual research should be applied. One governing body or overarching infrastructure seems to be the most practical approach to the sharing of data; the GAGH have been innovative in this area but further expansion into other fields may prove particularly beneficial. After all the betterment of scientific understanding is a common goal for all research and data preservation.




