Simply point to the final upload directory output by bcbio, and this function will take care of the rest. It automatically imports RNA-seq counts, metadata, and program versions used.

loadRNASeq(uploadDir, interestingGroups = "sampleName",
  sampleMetadataFile = NULL, samples = NULL, annotable = TRUE,
  organism = NULL, ensemblVersion = NULL, genomeBuild = NULL,
  transformationLimit = 50L, design = NULL, ...)

Arguments

uploadDir

Path to final upload directory. This path is set when running bcbio_nextgen -w template.

interestingGroups

Character vector of interesting groups. First entry is used for plot colors during quality control (QC) analysis. Entire vector is used for PCA and heatmap QC functions.

sampleMetadataFile

Optional. Custom metadata file containing sample information. Otherwise defaults to sample metadata saved in the YAML file. Remote URLs are supported. Typically this can be left NULL.

samples

Optional. Specify a subset of samples to load. The names must match the description specified in the bcbio YAML metadata. If a sampleMetadataFile is provided, that will take priority for sample selection. Typically this can be left NULL.

annotable

Optional. User-defined gene annotations (a.k.a. "annotable"), which will be slotted into rowData(). Typically this should be left undefined. By default, the function will automatically generate an annotable from the annotations available on Ensembl. If set NULL, then rowData() inside the resulting bcbioRNASeq object will be left empty. This is recommended for projects dealing with genes or transcripts that are poorly annotated.

organism

Optional. Organism name. Use the full latin name (e.g. "Homo sapiens"), since this will be input downstream to AnnotationHub/ensembldb. If set, this genome must be supported on Ensembl. Normally this can be left NULL, and the function will attempt to detect the organism automatically using detectOrganism().

ensemblVersion

Optional. Ensembl release version. If NULL, defaults to current release, and does not typically need to be user-defined. This parameter can be useful for matching Ensembl annotations against an outdated bcbio annotation build.

genomeBuild

Optional. Genome build. Normally this can be left NULL and the build will be detected from the bcbio run data. This can be set manually (e.g. "hg19" for the older Homo sapiens reference genome). Note that this must match the genome build identifier on Ensembl for annotations to download correctly.

transformationLimit

Maximum number of samples to calculate DESeq2::rlog() and DESeq2::varianceStabilizingTransformation() matrix. It is not generally recommended to change this value. For large datasets, DESeq2 will take a really long time applying variance stabilization. See Details. Use Inf to always apply transformations and 0 to always skip.

design

DESeq2 design formula. Empty by default. Can be updated after initial data loading using the design() function.

...

Additional arguments, slotted into the metadata() accessor.

Value

bcbioRNASeq.

Details

When number of samples is bigger than transformationLimit, rlog and vst counts will not be slotted into assays(). In this case, we recommend visualization using the tmm counts generated by edgeR.

Note

When working in RStudio, we recommend connecting to the bcbio-nextgen run directory as a remote connection over sshfs.

Examples

uploadDir <- system.file("extdata/bcbio", package = "bcbioRNASeq") bcb <- loadRNASeq(uploadDir, interestingGroups = "group")
#> 2017-05-23_rnaseq
#> 4 samples detected
#> Reading project-summary.yaml
#> Detecting organism from genome build
#> Genome: Mus musculus (mm10)
#> Loading Ensembl annotations from AnnotationHub #> 2017-10-27
#> EnsDB AH57770: Mus musculus Ensembl 90
#> Obtaining transcript-to-gene mappings
#> Parsed with column specification: #> cols( #> enstxp = col_character(), #> ensgene = col_character() #> )
#> Reading sample metrics
#> Reading bcbio run information
#> Parsed with column specification: #> cols( #> genome = col_character(), #> resource = col_character(), #> version = col_datetime(format = "") #> )
#> bcbio-nextgen.log.txt
#> bcbio-nextgen-commands.log.txt
#> Reading salmon counts using tximport
#> 1
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#> 2
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#> 3
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#> 4
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#>
#> summarizing abundance
#> summarizing counts
#> summarizing length
#> Performing trimmed mean of M-values (TMM) normalization
#> Generating internal DESeqDataSet
#> using just counts from tximport
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> Performing rlog transformation
#> Performing variance stabilizing transformation
#> Reading STAR featureCounts aligned counts
#> Parsed with column specification: #> cols( #> id = col_character(), #> group1_1 = col_integer(), #> group1_2 = col_integer(), #> group2_1 = col_integer(), #> group2_2 = col_integer() #> )
#> Warning: Unannotated genes detected in assay (0.594%)
print(bcb)
#> class: bcbioRNASeq #> dim: 505 4 #> metadata(28): version uploadDir ... devtoolsSessionInfo #> unannotatedGenes #> assays(6): raw normalized ... rlog vst #> rownames(505): ENSMUSG00000002459 ENSMUSG00000004768 ... #> ENSMUSG00000105982 ENSMUSG00000109048 #> rowData names(11): ensgene symbol ... seqCoordSystem entrez #> colnames(4): group1_1 group1_2 group2_1 group2_2 #> colData names(4): sampleID sampleName description group