Interactive Genome Analysis Challenge

Richard Holland tells us about the Interactive Genome Analysis Challenge launched by the Pistoia Alliance

The Pistoia Alliance has set out to find the fastest genome analysis system in the world. Through the Interactive Genome Analysis project the Alliance intends to identify the best combination of software and hardware systems that allow a sequenced genome to be analysed so fast that scientists can work interactively rather than iteratively. Truly interactive analysis allows hypotheses to be tested and verified in days rather than weeks, greatly accelerating scientific research in the field of genomics.

Genome sequencers generate genome data which then requires analysis and interpretation, but so far the technology to analyse genome data has not kept pace with the speed at which that data can be produced. Currently a typical genome analysis pipeline will run for a day or longer, a timescale which does not lend itself well to diagnostics or other time-sensitive situations. For quick, responsive analytics that would better support scenarios where time is of the essence, a much faster and more interactive genome analysis solution is needed.

At a recent lunchtime event led by Etzard Stolte and John Wise, the Pistoia Alliance held a discussion on this subject with a view to bringing a solution closer to reality. The event was attended by over 40 individuals from a diverse range of pharma, technology and startup companies, academic institutes and non-profits, and other alliances with an interest in this space such as the Global Alliance for Genomics and Health. With such a wide range of possible use-cases and feature requirements put forward by those present it was difficult to identify one single killer application which would be genuinely useful across the board.

Interactive Genome Analysis

The fundamental divide between the need to analyse single genomes in isolation versus the combined genomes of a population is not easily overcome, and neither is that between those who need whole-genome analysis and those who are doing targeted research on selected areas. It also wasn’t clear whether the need to run any arbitrary analysis interactively was more important than precomputing a range of possible alternative data scenarios for the scientist to work with within the available boundaries. However, a good degree of common ground was established and an area was identified where the Alliance could make some immediate impact.

An important point to note is that interactive doesn’t have to mean instant. During the discussions, Will Spooner of Eagle Genomics pointed out the fact that there are really only three useful concepts of time when it comes to measuring performance. The first is overnight execution: users are interested in being able to start analyses before they go home then see the results the next morning. The second is the time it takes a kettle to boil so that users can make a cup of coffee while waiting for their results. The third is the time that is generally accepted as the longest a typical web surfer will wait for a page to load before assuming it is broken, so that users can click on links to run analyses and get results almost immediately.

Depending on the situation any one of these performance brackets will suffice, but the major advantages come from making the leap from one of these brackets to another rather than shaving off seconds or hours within the same bracket. Reducing the analysis time from 8 to 4 hours will double up the capacity for overnight runs using the same hardware, but in terms of interactivity it will make little difference. If you reduce it from 8 hours to 4 minutes however, landing squarely in the cup-of-coffee bracket, you’ll achieve a major step-change in the way that users can work with that analysis. Genuinely interactive analysis becomes possible.

The Pistoia Alliance is therefore looking into running an Interactive Genome Analysis Challenge, in alignment with the efforts of the Global Alliance for Genomics and Health, to take a ‘standard’ NGS pipeline and offer a prize for developing the fastest possible implementation. The Pistoia Alliance wants to understand for a given set of benchmark data and quality parameters the blueprint of an optimal setup including the combination of software and hardware components required and the associated costs. A community of interest is being established to define the benchmark and establish the goals of the contest.

To keep up with the details of this challenge as it develops, please visit the Pistoia Alliance’s Interactive Project Portfolio Platform (IP3)

