DNAdigest interviews the MSSNG project
Interview with Mathew Pletcher from the MSSNG project. Mathew is one of the many great speakers at the BioData World Congress in Boston on September 14-15th presenting: “MSSNG – Changing the face of autism through big data and open science.”
Could you please introduce yourself and the MSSNG project?
My name is Mathew Pletcher, I am Vice President and Head of Genomic discovery for Autism Speaks and I am also currently serving as the interim Chief Science Officer of the organisation. The MSSNG project (pronounced as “missing”) is a collaboration between Autism Speaks and the Hospital for Sick Children in Toronto. The aim is to sequence whole genomes of 10,000 individuals from families with autism and to make this data broadly available to the research community. We want to enable better understanding of the genetic causes of autism, to progress to a better understanding of different subtypes of autism and, ultimately, to a precision medicine-based approach to the care of autism.
In addition to providing data for research, we believe that genomic sequencing has value today and that families who participate in MSSNG have the right to reap that benefit. Now, not some time in the indefinite future. We are starting to construct, in partnership with the company Genospace, a community portal that will allow those who have provided their samples and medical data to MSSNG to be able to access their own genome sequence and tools to help engage with it. This will allow participants to understand what we found out and to share this information with their physicians so that they can begin to incorporate it into their care plans.
When and how did the MSSNG project start? And by the way, why is it MSSNG?
The name represents the information that we are missing, the data gap that we have in understanding the underlying processes of autism to progress to precision medicine. The data itself just does not exist, so this initiative is about providing that missing data to the community.
The genesis of MSSNG project started back in 2011, where some pilot work was done to understand our capabilities of doing the whole genome sequencing. MSSNG project really kicked in towards the end of 2014 when Google came on board as a partner and we began sequencing genomes. The first 1,000 WGS were made available in January 2015 and in April of this year the number increased to 5,211.
How do people apply for this data?
It is really straightforward. There is an application available on the MSSNG project website and a contract that researchers have to sign. Because this now includes genomic data and health record data as well, there has to be some kind of assurance that the folks who are accessing it are doing so for a legitimate reason. In addition, the informed consent that is currently in place only allows the data to be used for autism and related disorders but we are in the process of reconsenting individuals to broaden that out. What investigators are asked to do is to provide to us the reason that they are accessing the data, what the research question is. They also have to make certain guarantees such as they won’t try to re-identify the patients and to make sure that they have in place the security measures necessary to protect this data once it is in their hands.
Do you have your internal DACO within the organisation?
No, we have a contract with an independent DACO for two reasons. Firstly, it is important for us to remove any issues around favouritism and politics so that it is clear to everyone that to make sure that any applications are being considered without bias. The other reason is that the group that we have contracted with is headed by Bartha Knoppers of McGill University and The Public Population Project in Genomics and Society. She really is one of the leading experts in the area of legal and ethical issues surrounding the sharing of genomic data. She is also part of the Global Alliance for Genomics and Health. Working with Bartha ensures that we don’t create a new set of requirements and restrictions that would be completely unique to MSSNG. We ensure the standards for MSSNG are in line with standards established for similar efforts.
How are your datasets annotated? How can researchers discover what data you have?
On our website you can find a number of things besides the access policy and contract. There is a list of samples that have been sequenced up to this point as well as other available data associated with these samples. This provides pretty good guidance as to what the data actually looks like once you are in.
How do you recruit the participants?
In this case, we have been able to make use of existing collections of samples that we and others have gathered over the years. Earlier on, Autism Speaks funded the collection of a number of different cohorts of families with autism and deep phenotypic characterisation and we have been able to use these collections.
Those samples were consented under that very strict consent to be used in autism studies only. Now, to re-consent that you need to contact those people, right? Probably some people are no longer alive… What are you going to do about it?
It varies. But as I said we are dealing with a couple of different collections. Some of them had the restriction on autism related research, some didn’t. The majority of those that contain the restriction are Canadian samples. And we are working through the process with the Canadian Research Ethics Board (REB) but as I understand it, the Canadian standards do not require us to recontact 100% of the participants, but if we make a good effort to recontact them and a certain percentage of those do agree, you can carry that over to the remaining samples.
What is the biggest challenge so far and what do you expect in the future, what would be the next milestone? What’s your future vision of the MSSNG project?
The biggest challenge has been working through the process of finding the right balance between protecting the privacy of the individuals who donated the samples, whilst at the same time honouring their desire to speed and aid research through their contribution by making the data as broadly available as possible. Because autism is a complex disorder, we need as many minds as possible working on the data and, ideally, working in a collaborative way. It has certainly been an effort to try to break down those barriers and allow for easy sharing across borders, across time zones, and to allow the information to flow, but to do it in a way that still protects the data in a meaningful and proper way. I’d say the initial position we’ve adopted, with Bartha’s support, is a fairly strict one. Putting the privacy and the needs of the participants first, but if we can gain confidence over time that the security measures we have put in place are sufficient then it may allow us to continue to make that sharing a little bit easier as we go forward.
The next most exciting thing, especially for me, is this idea of the community portal. We have done a good job on enabling the research community, because one thing we didn’t talk about was just how researchers actually interact with the data. Right now they have two options: those who have significant bioinformatics skills can access the data through a command line interface through the Google Genomics platform and run their own scripts and do very complex analysis on Google Cloud Platform. But we’ve also built a web-based browser on top of Google Cloud Platform, so that researchers who don’t have bioinformatics skills can go and query the entire dataset by gene, by specific variant, by individual sample, and be able to pull back the data that they found was relevant to their study. For example, genetic counsellors can find out whether a variant they have identified in a new family has been seen anywhere else. This is a place for them to simply ask the MSSNG dataset: has this mutation been seen in any patient with autism and beyond that, what did that individual look like, what were the phenotypic characteristics of that individual?
With this platform, we want to enable the individuals that want to know more about what the genetic landscape is. The tools are there for them to do it in a very easy way with drop down menus and search boxes. Not much experience is required.
That’s researchers. And then, on top of that do you want to have patients interacting with each other and maybe even directly with researchers?
That’s the part that I am most excited about – we are working through that process now; understanding what the legal, ethical and other concerns are to make this possible. But it is still based on this fundamental belief that data belongs to the families and they have the right to access it and use it as best they can, to apply it to their own circumstances and their own medical care.
I very much see it as a resource for the whole community. In addition to accessing their data, we also envision that the portal will allow families to connect with each other, especially those who have similar genetic subtypes of autism. Because autism is a spectrum disorder and it encompasses a huge heterogeneity in the biological underpinnings, in how it manifests in the medical and health concerns of individuals, the conversations one can have with a general community can sometimes be confusing or not productive because the diversity of experiences is so wide. If you can bring together families who are on the same path, because of a commonality in the genetic cause, then those conversations are going to be much more useful to them. And those are the activities that we want to empower.
When do you expect this to start happening? Is there any time frame for that?
Yes, ideally we would like to open a beta version of the portal by the end of this year with the idea that we would have a fully functional version up and running some time next year.