Quality control and differential expression for bcbio RNA-seq experiments.

Installation

This is an R package.

Bioconductor method

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("devtools")
biocLite("remotes")
biocLite("GenomeInfoDbData")
biocLite(
    "hbc/bcbioRNASeq",
    dependencies = c("Depends", "Imports", "Suggests")
)

conda method

conda install -c bioconda r-bcbiornaseq

Load bcbio run

library(bcbioRNASeq)
bcb <- bcbioRNASeq(
    uploadDir = "bcbio_rnaseq_run/final",
    interestingGroups = c("genotype", "treatment"),
    organism = "Homo sapiens"
)
# Back up all data inside bcbioRNASeq object
flatFiles <- flatFiles(bcb)
saveData(bcb, flatFiles)

This will return a bcbioRNASeq object, which is an extension of the Bioconductor RangedSummarizedExperiment container class.

Parameters:

  • uploadDir: Path to the bcbio final upload directory.
  • interestingGroups: Character vector of the column names of interest in the sample metadata, which is stored in the colData() accessor slot of the bcbioRNASeq object. These values should be formatted in camelCase, and can be reassigned in the object after creation (e.g. interestingGroups(bcb) <- c("batch", "age")). They are used for data visualization in the quality control utility functions.
  • organism: Organism name. Use the full latin name (e.g. “Homo sapiens”).

Consult help("bcbioRNASeq", "bcbioRNASeq") for additional documentation.

Sample metadata

When loading a bcbio RNA-seq run, the sample metadata will be imported automatically from the project-summary.yaml file in the final upload directory. If you notice any typos in your metadata after completing the run, these can be corrected by editing the YAML file. Alternatively, you can pass in a sample metadata file into bcbioRNASeq() using the sampleMetadataFile argument.

Metadata file example

The samples in the bcbio run must map to the description column. The values provided in description must be unique. These values will be sanitized into syntactically valid names (see help("makeNames", "basejump")), and assigned as the column names of the bcbioRNASeq object. The original values are stored as the sampleName column in colData(), and are used for all plotting functions.

description genotype
sample1 wildtype
sample2 knockout
sample3 wildtype
sample4 knockout

R Markdown templates

This package provides multiple R Markdown templates, including quality control, differential expression using DESeq2, and functional enrichment analysis.

These are available in RStudio at File -> New File -> R Markdown... -> From Template.

Citation

citation("bcbioRNASeq")

Steinbaugh MJ, Pantano L, Kirchner RD, Barrera V, Chapman BA, Piper ME, Mistry M, Khetani RS, Rutherford KD, Hoffman O, Hutchinson JN, Ho Sui SJ. (2017). bcbioRNASeq: R package for bcbio RNA-seq analysis. F1000Research 6:1976.

References

The papers and software cited in our workflows are available as a shared library on Paperpile.