Ecological Perspective on Data Sharing
We have invited Charlie Outhwaite (@charlielouo) to write a guest blog post on the topic of openness and data sharing from an ecological point of view.
The post give us the great opportunity to draw a parallel on how the same type of data sharing problems we are experiencing in the field of genomics are observed across different scientific disciplines.
The field of ecology is a vast and varied one. As a result, the types and quantities of data produced differ hugely. Whether a study is small in scale, such as a field or lab based project, or a large, country or global scale, big data study: the amount of data that could be made available is enormous. Yet the field of ecology has been considered as behind in terms of its openness when compared to other areas of biology such as genomics. With such vast amounts and types of data available, sharing that data openly has the potential to boost research opportunities and open up collaboration within and between fields.
As is the case within many scientific disciplines, a major barrier for data sharing in ecology is the fear of being scooped. For this reason, many researchers would be unlikely to release their data until they have been able to complete their intended work first. This problem is exacerbated in ecology where data are often collected independently by one or a few people who gain a sense of ownership over that data. Although permissions of use and attributions can be set up, this sense of ownership can act as a barrier to data sharing. If an ecologist has spent months in the field collecting and then collating that data, they are not going to want to share it until they have had the chance to carry out all their planned analyses, and will probably then hold onto it for a bit longer, just in case!
Additional problems that are shared with other areas of research include getting credit for sharing data and actually knowing how to share data. The credit issue is starting to be addressed by data journals where citations can be gained as a result of publishing data. With citations often referred to as the “currency” of science, bringing data sharing into this fundamental aspect of academia is key.
Although many options are now available for easy and hassle free data sharing, this knowledge is not widespread within the ecological community. It is also considered to be too time consuming to learn these new techniques. Options available to ecologists include figshare which can be used to make data publicly available and citable, GBIF; a forum for the sharing and reuse of biodiversity data and GitHub which allows the sharing of code, to mention just a few. The tools are available, now we need to increase the knowledge on how to use them and encourage their use in day-to-day research life. I personally think these tools should be introduced during undergraduate courses. This would ensure that future generations of researchers have the basic skills they need to share data effectively.
So far these issues are applicable to most areas of science and it is clear that efforts are being made to overcome them. However, ecological data also have unique issues. My work in particular can highlight one such problem. I use species presence data collected by volunteers to investigate changes in the status of biodiversity over time. As these data come from various organisations and groups, the views on who owns the data and whether or not it should be shared can vary. Of more importance, is the fact that these data consist of precise localities indicating exactly where species have been recorded. For a common species, this shouldn’t be a problem, but what about threatened or endangered species? Should their locations be openly available? Some species are protected by law and the data relating to these species cannot be used in a study which could result in the data being accessed by others. So, what would the protocol be in this case; should the dataset be openly shared which could lead to people tracking down endangered species and potentially putting these populations at risk? What other options are available? Until specific protocols are put in place which aims to understand and mitigate the potential problems with specific kinds of data, many data holders are likely to simply keep it to themselves.
The potential for data sharing within the field of ecology is great. The scale and scope of work that could be achieved would be vastly increased if a more open and sharing community was possible. However, as well as the issues that are more widely shared within science, there are a number of issues specific to ecology that need to be addressed in order for the open data movement to pick up momentum. Once these problems are understood and ways to deal with them are established, standardised ways of sharing should be more accessible and accepted within the community. Currently, however, I think this lack of data sharing is preventing the generation of new and exciting research and potentially limiting what we are able to offer from within this field.
Charlie is a first year PhD student based at the Centre for Ecology & Hydrology. Her work looks into producing biodiversity indicators from biological records, exploring drivers behind the trends and the way species traits affect susceptibility to change.