DNAdigest interviews MIABIS Connect
Today we interview the team of MIABIS Connect – a federation software platform designed to connect biobanks. MIABIS stands for Minimum Information About BIobank data Sharing.
1. What exactly is MIABIS Connect?
MIABIS Connect is a federation software platform initially developed to promote biobank interoperability. MIABIS Connect can be used to create federations of biobanks to make samples openly available to the biomedical research community. The central semantic for the federation is the Minimum Information About BIobank data Sharing (MIABIS), the de facto standard in BBMRI-ERIC for sharing bio-resources.
There are some relevant features that make MIABIS Connect quite interesting. For instance, it is a “light” software solution that uses open-source software (ElasticSearch) and in-house developed small java modules, it requires a minimum involvement from the biobank staff or IT support and an important distinctive attribute; the biobank data stays in the biobank!
From the technical point of view, MIABIS Connect has two main modules: MIABIS Server and MIABIS Client. The server is the software to be installed in the biobank side while the client is a web application that queries all the biobanks in the federation and exposes the query results in a very friendly way through a pre-configured Kibana dashboard (also from Elastic).
In principle, we provided a proof of concept to federate biobank data using sample as central entity, but this framework can be used to federate other components from the bio-medical research ecosystem. It is just a matter of defining a file and a schema representing the standard and some relationships. With a little modification, MIABIS Connect can be used to federate any field that has a defined standard or ontology to represent metadata, data, etc. Our aim is to create a semantic agnostic federation framework. In that case, it would no longer be MIABIS Connect, we will need to find a more abstract name.
2. How can biobanks join MIABIS Connect? What is the data format and tools that they would need to adopt?
At first, an organization, for instance BBMRI-ERIC, should create a governance system for the federation where regulations and constraints regarding being part of the federation, should be well defined. Then, the biobank should allocate a server to install the MIABIS Server and also a data repository where some files from the biobank management system (e.g. LIMS) will be exported. Once the MIABIS Server is installed, the biobank should map its data to MIABIS using one MIABIS Connect standalone module. The produced map will be used by the server application to convert the repository files into indexes in ElasticSearch.
In principle, the biobank exports four files with data about: sample, sample collection, study, and contact person. The files are delimited by whatever the biobank uses as delimited and there is no need to have special order or names in the columns. The mapping process has already taken care of this.
3. Can a biobank use MIABIS Connect if they are using another standard format for their data?
MIABIS Connect doesn’t require that the biobank data model is strictly MIABIS compliant. Most of the MIABIS attributes and components are naturally part of biobank management systems but can be used with other names. The trick here is the mapping process. The Mapper “discovers” the columns of the files from the biobank and “assists” the biobank in the mapping process. Once the map is produced, the application will “interpret” the biobank files and convert them into indexes to be queried by the MIABIS Client. If the biobank changes its semantic or data model, then the map needs to be recreated.
4. How many biobanks are already connected and who will connect in the near future?
Right now we have the proof of concept with three servers (Karolinska, Elixir and Max Planck Institute) using a mix of real and “fake” data. The next step is to create a biobank federation in Sweden for high-quality samples. It will be a real pilot for MIABIS Connect. A paper will be produced out of this.
5. What is the incentive for a biobank to connect?
The main incentive is to make available the samples that can be shared, for instance “left overs” from studies or collections that have the right consent to be reused in similar studies. This is the case of a federation that uses the sample as central entity. For a biobank, federation that uses as a central semantic not only sample availability but also resources and services, the biobank can through the federation expose its services and get in contact with research clients or industrial partners. Everything depends on the semantic behind the federation.
6. Who is responsible for implementation of MIABIS Connect by a biobank
The MIABIS server will be a package to be downloaded from a repository, that the governance of the federation should provide. The governance of the federation participates in the creation of the standard model behind that particular federation. What we have implemented as a proof of concept is a MIABIS sample-centered model, because the aim was to expose samples for sharing. Other organizations creating federations based on MIABIS Connect would like to have other entities involved and other aims.
In any case, a biobank belonging to a federation should designate a person with some IT background to install the MIABIS Server and create the map between the biobank data model and MIABIS.
7. Why is federated search needed for metadata? If all biobanks could input their data into a central system, would that be a preferred solution?
The main conflicting issue in a centralized approach is that the biobanks have to export the data to a central repository. Even if it’s only metadata, a data transformation process has to be done in order to be compliant with the centralized data model. This is not a big problem if the biobank sends metadata not so frequently to the central database. The sample level is no longer metadata, it can be sensible data depending on what information is being associated with the sample. In this particular federated approach, the biobank exports what they have as they can, including or excluding sensible data. MIABIS Server will transform their data in their local servers and make available for query only what has been defined by the federation model. The data from the biobank stays in the biobank.
8. Can a researcher apply for access to samples via the MIABIS-Connect system?
So far, with the simple model that we have proposed, the researcher can email to the contact person for a particular study, sample collection or biobank and ask for samples using the exposed sample IDs. Maybe in the future, if we get resources to continue the development of this framework, we can add a simple e-commerce module between the client (web query application) and the servers (biobanks).
9. How is MIABIS-Connect funded and what is the long-term vision for MIABIS-Connect?
Right now, we are improving the framework to start the pilot in Sweden. So far, it is a Karolinska Institute – Max Planck Institute – Elixir collaboration. The Swedish pilot will produce a paper that hopefully will attract the attention of other biobank and research networks. It will be an open-source product aiming to stimulate data sharing in the biomedical research community. The way to make it sustainable beyond the Karolinska use case, is through services for creation of the standard and data model of the federation; personalization of functional modules at the server level; helpdesk for installation, upgrading, etc.
The best way to sustain this software is by being adopted by an organization that can continue its development, maintenance and provide services for the users. For instance, Elixir, BBMRI-ERIC, BCNet or RD-Connect. Through this software framework we want to contribute to the embracement of the open science and open data spirit in the biobanking and biomedical research community regardless who is getting the acknowledgement. Whoever can get this software and make it better. The big players in this game will be the biobank and researcher networks willing to share.
You can find more information here.
Are you part of a project that facilitates data sharing for genomics or other related research?
Are you directly or indirectly involved in the Open Science movement?
Would you like to be featured on our blog?
We would love to hear from you.