In addition to the material here, our teaching team maintains a list of useful bioinformatics resources.



Our NGS analyses rely on the bcbio.nextgen framework, a community-developed NGS workflow that comes with full documentation, is open source under the MIT license, and is in use at over a dozen sites internationally. It is installed at both the FAS and HMS Research Computing environments and provides researchers with best practice workflows for exome / whole-genome sequencing (built around GATK 3.0 and FreeBayes) , RNA-Seq (TopHat2/STAR, Sailfish/Salmon, edgeR/DESeq2/limma), and most recently, small RNA-seq, single-cell RNA-seq and the intial steps of ChIP-seq. Take a look at some of the key blog posts around bcbio:


If you are working on the HMS Orchestra cluster and need a piece of bioinformatics software, we recommend you look at the software available through BioGrids project, a software stack for bioinformatics available on the HMS Orchestra cluster


We also recommend conda as an easy way to install and run software in a shared environment without worrying about dependencies. Many general use packages can be found on Anaconda Cloud and bioinformatics software can be found through bioconda.

Computational and Data Storage Resources

Research Computing

We work closely with and use the computational resources of both:

We also use a variety of cloud architectures including Amazon AWS and Microsoft Azure](

Data Management

Data storage needs increase by the day. To help to figure out how to keep storage costs manageble, we contribute to an important initiative at HMS to develop and propagate good data management practices and resources across the Harvard community. For details and tips on managing your data, see the HMS data management page

Next Generation Sequencing

We work with data from any sequencing core but currently work most closely with the:

and the