DNAdigest Hack Day
Secure the data – share the knowledge
Welcome to the Wiki for the DNAdigest Hack Day!
What: DNAdigest Hack Day
Where: Microsoft Research Cambridge
When: August 10th, 2013
Aim of DNAdigest Hack Day
We will make and break ideas and prototypes for solutions related to the DNAdigest platform model for data sharing as presented by Fiona in the morning presentation.
What is the problem?
Human genomics, i.e. the genetics of the entire human genome, seeks to understand the function of all of the 3 billion basepairs in the genome in the context of health and disease.
Published results in scientific papers is not sufficient for discovery of new genetic links to disease, researchers need access to data – a LOT of data – but current mechanisms for discovering and accessing genomics data are cumbersome and inefficient. It may take anything up to 6 months to gain access to specific data sets, and separate manual applications are needed. We need more efficient systems for discovering, browsing, sharing and accessing genomics data.
Who are our stakeholders?
We will explore this question further in the morning session.
What is an empathy map? – from the Agile Coaching blog
How to create an empathy map – from Hekovnik startup school
What does the data look like?
Example .bam and .vcf datasets from illumina: http://www.illumina.com/truseq/tru_resources/datasets.ilmn
The 1000 Genomes project has sequenced the complete genomes of ~2,500 healthy individuals and made both the raw data and the variant calls public. The data formats and data sets available are described here: http://www.1000genomes.org/data
What are our tools?
LucidChart – excellent online tool for drawing diagrams such as network diagrams, use cases, organisation charts, etc. Integrates seamlessly with Google Drive!
Rapid Interface Builder – super-easy drag-and-drop user interfaces
We can build or use tools with functionality to deal with .bam and .vcf files such as bamtools: https://github.com/pezmaster31/bamtools
and tools and packages for dealing with disease metadata such as the human phenotype ontology: http://www.human-phenotype-ontology.org/
The 1000 genomes project implemented a simple slicing mechanism for selecting individuals or populations called DataSlicer: http://browser.1000genomes.org/Homo_sapiens/UserData/SelectSlice
References and inspiration
If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology – July 2013 article in PLoS ONE
Data As A Service DAAS – July 29th article on Big Data Republic
Unravelling genomic variation from next generation sequencing data – July 25th review of data types and file formats in BioDataMining journal
Understanding Open Science – July 30th article by John Wilbanks
Examples of databases collecting published genetic annotation: HGMD professional, COSMIC, … (insert links to DBs here)
Examples of other mechanisms for data sharing and data access: Gen2Phen, CafeVariome, …
Notes and output
(insert links to notes, scribbles, documentation from the hack day)