Secure the data – share the knowledge

Welcome to the Wiki for the DNAdigest Hack Day!

What: DNAdigest Hack Day

Where: Microsoft Research Cambridge

When: August 10th, 2013

Aim of DNAdigest Hack Day

We will make and break ideas and prototypes for solutions related to the DNAdigest platform model for data sharing as presented by Fiona in the morning presentation. 

What is the problem?

Human genomics, i.e. the genetics of the entire human genome, seeks to understand the function of all of the 3 billion basepairs in the genome in the context of health and disease.  

Published results in scientific papers is not sufficient for discovery of new genetic links to disease, researchers need access to data – a LOT of data – but current mechanisms for discovering and accessing genomics data are cumbersome and inefficient. It may take anything up to 6 months to gain access to specific data sets, and separate manual applications are needed. We need more efficient systems for discovering, browsing, sharing and accessing genomics data. 

Who are our stakeholders?

We will explore this question further in the morning session. 

What is an empathy map? – from the Agile Coaching blog

How to create an empathy map – from Hekovnik startup school

What does the data look like?

Example .bam and .vcf datasets from illumina: http://www.illumina.com/truseq/tru_resources/datasets.ilmn

The 1000 Genomes project has sequenced the complete genomes of ~2,500 healthy individuals and made both the raw data and the variant calls public. The data formats and data sets available are described here: http://www.1000genomes.org/data

What are our tools?


LucidChart – excellent online tool for drawing diagrams such as network diagrams, use cases, organisation charts, etc. Integrates seamlessly with Google Drive! 

Rapid Interface Builder – super-easy drag-and-drop user interfaces 


We can build or use tools with functionality to deal with .bam and .vcf files such as bamtools: https://github.com/pezmaster31/bamtools

and tools and packages for dealing with disease metadata such as the human phenotype ontology: http://www.human-phenotype-ontology.org/

The 1000 genomes project implemented a simple slicing mechanism for selecting individuals or populations called DataSlicer: http://browser.1000genomes.org/Homo_sapiens/UserData/SelectSlice

References and inspiration

If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology – July 2013 article in PLoS ONE

Data As A Service DAAS –  July 29th article on Big Data Republic

Unravelling genomic variation from next generation sequencing data – July 25th review of data types and file formats in BioDataMining journal

Understanding Open Science – July 30th article by John Wilbanks


Examples of databases collecting published genetic annotation: HGMD professional, COSMIC, … (insert links to DBs here) 

Examples of other mechanisms for data sharing and data access: Gen2Phen, CafeVariome, … 


Notes and output

