Tuesday, 7 February 2017

offline plotly Gantt plots using Python/pandas

modified from https://plot.ly/python/gantt/#use-a-pandas-dataframe to do offline and outside of ipython

Tuesday, 29 November 2016

Verily (Google) is hiring Computational Biologists


the role is described as 'hardware engineering' interestingly. the preferred qualifications are very loose...

  • Demonstrated knowledge of core concepts in machine learning or probability and statistics.
  • Willingness to learn molecular and cell biology, computer science, and statistics.
  • Demonstrated effective written and verbal communication skills.

I bet they will be inundated with submissions! 

Wednesday, 9 November 2016

Compiling BWA on Ubuntu 16.04.1 LTS

#install prereq else you will get utils.c:33:18: fatal error: zlib.h: No such file or directory
sudo apt-get install zlib1g-dev

#download the latest version and compile
$ wget http://downloads.sourceforge.net/project/bio-bwa/bwa-0.7.12.tar.bz2
$ tar jxvf bwa-0.7.12.tar.bz2
$ cd bwa-0.7.12/
$ make 

Wednesday, 18 May 2016

CIViC is an open access, open source, community-driven web resource for Clinical Interpretation of Variants in Cancer

CIViC's Role in Precision Medicine

Realizing precision medicine will require this information to be centralized, debated and interpreted for application in the clinic. CIViC is an open access, open source, community-driven web resource for Clinical Interpretation of Variants in Cancer. Our goal is to enable precision medicine by providing an educational forum for dissemination of knowledge and active discussion of the clinical significance of cancer genome alterations.
CIViC is a community-edited forum for discussion and interpretation of peer-reviewed publications pertaining to the clinical relevance of variants (or biomarker alterations) in cancer. These interpretations may include associations between molecular alterations (or lack of alteration) and one or more drugs, diagnoses, prognoses or other treatment decisions. These interpretations of clinical significance (or lack of clinical significance) are purely for research purposes. A finding of no interpretation does not necessarily indicate lack of relevance for any specific variant or biomarker alteration. Interpretations are not presented in ranked order of potential or predicted importance.These interpretations make no promise or guarantee of any clinical benefit (or lack of clinical benefit).

Thursday, 5 May 2016

Verily Life Sciences new hires


I guess scouting a company's recruitment page to understand the projects is a universally common thing. Interestingly this page even has a word cloud from the LinkedIn Profiles to see where the hires are previously from.

The new hires reflect the scope of the few Verily projects Google/Alphabet has allowed to escape to the public so far: a contact lens venture with Novartis AG for diabetics to track blood glucose levels, its buyout of a company with a spoon that counters shaking by Parkinson's disease patients and the big picture Baseline study, a deep research project designed to define a healthy human being."The future of biotech, medical and tech is going to coalesce. We've seen that with medical sensors that do more than count steps, artificial intelligence and virtual reality," Topol said. "All these things are going to have a big impact in medicine. It's a natural evolution." excerpted from @SFBusinessTimes 

Friday, 29 April 2016

FDA launching the second precisionFDA challenge.

PrecisionFDA Truth Challenge

The challenge begins with two precisionFDA-provided input datasets, corresponding to whole-genome sequencing of the HG001 (NA12878) and HG002 (NA24385) human samples. Both samples were sequenced under similar sequencing conditions and instruments, at the same sequencing site. Your mission is to process these two FASTQ datasets through your mapping and variation calling pipeline and create VCF files. You can generate those results on your own environment, and upload them to precisionFDA, or you can reconstruct your pipeline on precisionFDA and run it there. Regardless of how you generate your VCF files, you will subsequently submit them as your entry to the challenge.
For HG002, the truth data will not be known during the challenge. After submissions close on May 26, GiaB will publish their reference VCF file for HG002. The precisionFDA team will then run and publish comparisons between each contestant’s HG002 VCF file and the GiaB HG002 reference VCF. This will publicly reveal how similar is each result to the GiaB HG002 reference.
For HG001, the reference VCF is already available. You are therefore asked to conduct a comparison between your VCF and the GiaB HG001 (NA12878) reference VCF, and include it in your submission entry, for the following reasons:
  1. to ensure that your VCF files are compatible with the comparison process (remember that we won’t be able to check on your HG002 VCF until after the end of submissions, so you are using your HG001 VCF as a check that your files can be compared without issues)
  2. for the community to be able to contrast your performance on a previously known sample (HG001) versus a previously unknown (HG002), and to evaluate any overfitting on HG001
Your entry to the challenge comprises your submitted HG001 and HG002 VCFs, your submitted HG001 comparison, and the HG002 comparison conducted by precisionFDA. Each comparison outputs several metrics (such as precision*, recall*, f-measure, or number of common variants). Selected participants and winners** will be recognized on the precisionFDA website. Therefore, we hope you are willing to share your experience with others to further enhance the community's effort to ensure accuracy and consistency of tests.
The challenge runs until May 26, 2016.

Wednesday, 27 April 2016

Free Imputation servers

Free imputation servers will allow anyone to use the full haplotype reference panel to impute missing genotypes in their data.  Users will be able to upload (pre-phased or unphased) genotype data to the server. Imputation will be carried out remotely on the server, and the imputed data will then be made available to the user. 

Prototype imputation servers are already available at 

A prototype phasing server for phasing high coverage sequenced samples is available at

Source: http://www.haplotype-reference-consortium.org/data-access

Datanami, Woe be me