Simply point to the final upload directory output by bcbio, and this function will take care of the rest. It automatically imports RNA-seq counts, metadata, and program versions used.

loadRNASeq(uploadDir, interestingGroups = "sampleName",
  sampleMetadataFile = NULL, maxSamples = 50, ensemblVersion = "current",
  ...)

Arguments

uploadDir

Path to final upload directory. This path is set when running bcbio_nextgen -w template.

interestingGroups

Character vector of interesting groups. First entry is used for plot colors during quality control (QC) analysis. Entire vector is used for PCA and heatmap QC functions.

sampleMetadataFile

Optional. Custom metadata file containing sample information. Otherwise defaults to sample metadata saved in the YAML file.

maxSamples

Maximum number of samples to calculate DESeq2::rlog() and DESeq2::varianceStabilizingTransformation() matrix. See Details.

ensemblVersion

Ensembl release version. Defaults to current, and does not typically need to be user-defined. This parameter can be useful for matching Ensembl annotations against an outdated bcbio annotation build.

...

Additional arguments, slotted into the metadata() accessor.

Value

bcbioRNASeq.

Details

When number of samples is bigger than maxSamples, rlog and vst slot in SummarizedExperiment::SummarizedExperiment will be the output of edgeR normalization method.

Note

When working in RStudio, we recommend connecting to the bcbio-nextgen run directory as a remote connection over sshfs.

Examples

uploadDir <- system.file("extdata/bcbio", package = "bcbioRNASeq") bcb <- loadRNASeq(uploadDir, interestingGroups = "group")
#> 2017-05-23_rnaseq
#> 4 samples detected
#> Reading project-summary.yaml
#> Genome: Mus musculus (mm10)
#> Obtaining Ensembl annotations with AnnotationHub and ensembldb
#> /Users/mike//.AnnotationHub
#> snapshotDate(): 2017-09-07
#> EnsDB: Mus musculus Ensembl 90
#> Parsed with column specification: #> cols( #> enstxp = col_character(), #> ensgene = col_character() #> )
#> Reading bcbio run information
#> Parsed with column specification: #> cols( #> genome = col_character(), #> resource = col_character(), #> version = col_datetime(format = "") #> )
#> Parsed with column specification: #> cols( #> program = col_character(), #> version = col_character() #> )
#> Warning: bcbio-nextgen.log file missing
#> Warning: bcbio-nextgen-commands.log file missing
#> Reading salmon counts using tximport
#> 1
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#> 2
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#> 3
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#> 4
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#>
#> summarizing abundance
#> summarizing counts
#> summarizing length
#> Generating internal DESeqDataSet for quality control
#> using just counts from tximport
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> Performing rlog transformation
#> Performing variance stabilizing transformation
#> Reading STAR featureCounts aligned counts
#> Parsed with column specification: #> cols( #> .default = col_integer(), #> id = col_character() #> )
#> See spec(...) for full column specifications.
#> Warning: rowData mismatch with assay slot: ENSMUSG00000101738, ENSMUSG00000104475, ENSMUSG00000109048
#> Preparing SummarizedExperiment