bcbioSingleCell is an S4 class that extends SingleCellExperiment, and is designed to store a bcbio single-cell RNA-seq analysis. This class contains read counts saved as a sparse matrix (sparseMatrix), sample metadata, and cell quality control metrics.

bcbioSingleCell(uploadDir, sampleMetadataFile = NULL, organism = NULL,
  ensemblRelease = NULL, genomeBuild = NULL, gffFile = NULL,
  transgeneNames = NULL, spikeNames = NULL,
  interestingGroups = "sampleName",
  BPPARAM = BiocParallel::SerialParam(), ...)

Arguments

uploadDir

character(1). Final upload directory path.

sampleMetadataFile

character(1). Sample metadata file path. CSV or TSV is preferred, but Excel worksheets are also supported. Check the documentation for conventions and required columns.

organism

character(1). Full Latin organism name (e.g. "Homo sapiens").

ensemblRelease

integer(1). Ensembl release version (e.g. 90). We recommend setting this value if possible, for improved reproducibility. When left unset, the latest release available via AnnotationHub/ensembldb is used. Note that the latest version available can vary, depending on the versions of AnnotationHub and ensembldb in use.

genomeBuild

character(1). Ensembl genome build assembly name (e.g. "GRCh38"). If set NULL, defaults to the most recent build available. Note: don't pass in UCSC build IDs (e.g. "hg38").

gffFile

character(1). GFF/GTF (General Feature Format) file. Generally, we recommend using GTF (GFFv2) instead of GFFv3.

transgeneNames

character. Vector indicating which assay rows denote transgenes (e.g. EGFP, TDTOMATO).

spikeNames

character. Vector indicating which assay rows denote spike-in sequences (e.g. ERCCs).

interestingGroups

character. Groups of interest to use for visualization. Corresponds to factors describing the columns of the object.

BPPARAM

An optional BiocParallelParam instance determining the parallel back-end to be used during evaluation, or a list of BiocParallelParam instances, to be applied in sequence for nested calls to BiocParallel functions.

...

Additional arguments.

Value

bcbioSingleCell.

Note

Updated 2019-08-27.

Remote data

When working in RStudio, we recommend connecting to the bcbio-nextgen run directory as a remote connection over sshfs.

See also

Examples

uploadDir <- system.file("extdata/indrops", package = "bcbioSingleCell") x <- bcbioSingleCell(uploadDir)
#> Dated project directory: 2018-01-01_bcbio.
#> 1 sample detected: #> [1] "multiplexed_AAAAAAAA"
#> Importing 'project-summary.yaml' using 'yaml::yaml.load_file()'.
#> Data versions are missing.
#> Importing 'programs.txt' using 'data.table::fread()'.
#> Setting row names from 'rowname' column.
#> Importing 'bcbio-nextgen.log' using 'base::readLines()'.
#> 'bcbio-nextgen.log' file is empty.
#> Importing 'bcbio-nextgen-commands.log' using 'base::readLines()'.
#> 1000 reads per cellular barcode cutoff detected.
#> Counts will imported as genes.
#> UMI type: harvard-indrop-v3.
#> Importing unfiltered cellular barcode distributions...
#> Importing 'multiplexed-AAAAAAAA-barcodes.tsv' using 'data.table::fread()'.
#> Setting row names from 'rowname' column.
#> Importing counts...
#> Importing 'multiplexed-AAAAAAAA.mtx.rownames' using 'base::readLines()'.
#> Importing 'multiplexed-AAAAAAAA.mtx.colnames' using 'base::readLines()'.
#> Importing 'multiplexed-AAAAAAAA.mtx' using 'Matrix::readMM()'.
#> Importing sidecar 'multiplexed-AAAAAAAA.mtx.rownames'.
#> Importing 'multiplexed-AAAAAAAA.mtx.rownames' using 'base::readLines()'.
#> Importing sidecar 'multiplexed-AAAAAAAA.mtx.colnames'.
#> Importing 'multiplexed-AAAAAAAA.mtx.colnames' using 'base::readLines()'.
#> 'multiplexed-AAAAAAAA.mtx' does not contain syntactically valid names.
#> Imported 'multiplexed-AAAAAAAA'.
#> bcbio GTF file: /n/app/bcbio/dev/genomes/Hsapiens/hg38/rnaseq/ref-transcripts.gtf.
#> bcbio GTF file is not accessible.
#> Slotting empty ranges into 'rowRanges()'.
#> 'sampleMetadataFile' is recommended for multiplexed samples (e.g. harvard-indrop-v3).
#> Calculating 100 sample metrics.
#> Calculating metrics without biotype information. #> 'rowRanges' required to calculate: nCoding, nMito, mitoRatio.
#> bcbioSingleCell 0.4.5 #> uploadDir: /private/var/folders/j1/hw6j14mj3wq381x7mynwfbp00000gn/T/RtmpIbOaaE/temp_libpathe6db58dc9be0/bcbioSingleCell/extdata/indrops #> dates(2): [bcbio] 2018-01-01; [R] 2019-09-17 #> level: genes #> interestingGroups: sampleName #> filtered: FALSE #> class: SingleCellExperiment #> dim: 50 100 #> metadata(27): allSamples bcbioCommandsLog ... wd yaml #> assays(1): counts #> rownames(50): ENSG00000071082 ENSG00000100316 ... ENSG00000269028 #> ENSG00000282105 #> rowData names(0): #> colnames(100): AAACACTA_CTTCGATT AAACTACA_CCACATTA ... #> TGGGAATT_ATATAGGA TGTTATCA_ACGCAGAG #> colData names(9): log10FeaturesPerCount mitoRatio ... sampleID #> sampleName #> reducedDimNames(0): #> spikeNames(0):
x <- bcbioSingleCell( uploadDir = uploadDir, sampleMetadataFile = file.path(uploadDir, "metadata.csv") )
#> Dated project directory: 2018-01-01_bcbio.
#> 1 sample detected: #> [1] "multiplexed_AAAAAAAA"
#> Importing 'project-summary.yaml' using 'yaml::yaml.load_file()'.
#> Data versions are missing.
#> Importing 'programs.txt' using 'data.table::fread()'.
#> Setting row names from 'rowname' column.
#> Importing 'bcbio-nextgen.log' using 'base::readLines()'.
#> 'bcbio-nextgen.log' file is empty.
#> Importing 'bcbio-nextgen-commands.log' using 'base::readLines()'.
#> 1000 reads per cellular barcode cutoff detected.
#> Counts will imported as genes.
#> UMI type: harvard-indrop-v3.
#> Importing 'metadata.csv' using 'data.table::fread()'.
#> Setting row names from 'rowname' column.
#> Multiplexed samples detected.
#> Importing unfiltered cellular barcode distributions...
#> Importing 'multiplexed-AAAAAAAA-barcodes.tsv' using 'data.table::fread()'.
#> Setting row names from 'rowname' column.
#> Importing counts...
#> Importing 'multiplexed-AAAAAAAA.mtx.rownames' using 'base::readLines()'.
#> Importing 'multiplexed-AAAAAAAA.mtx.colnames' using 'base::readLines()'.
#> Importing 'multiplexed-AAAAAAAA.mtx' using 'Matrix::readMM()'.
#> Importing sidecar 'multiplexed-AAAAAAAA.mtx.rownames'.
#> Importing 'multiplexed-AAAAAAAA.mtx.rownames' using 'base::readLines()'.
#> Importing sidecar 'multiplexed-AAAAAAAA.mtx.colnames'.
#> Importing 'multiplexed-AAAAAAAA.mtx.colnames' using 'base::readLines()'.
#> 'multiplexed-AAAAAAAA.mtx' does not contain syntactically valid names.
#> Imported 'multiplexed-AAAAAAAA'.
#> bcbio GTF file: /n/app/bcbio/dev/genomes/Hsapiens/hg38/rnaseq/ref-transcripts.gtf.
#> bcbio GTF file is not accessible.
#> Slotting empty ranges into 'rowRanges()'.
#> Calculating 100 sample metrics.
#> Calculating metrics without biotype information. #> 'rowRanges' required to calculate: nCoding, nMito, mitoRatio.
#> bcbioSingleCell 0.4.5 #> uploadDir: /private/var/folders/j1/hw6j14mj3wq381x7mynwfbp00000gn/T/RtmpIbOaaE/temp_libpathe6db58dc9be0/bcbioSingleCell/extdata/indrops #> dates(2): [bcbio] 2018-01-01; [R] 2019-09-17 #> level: genes #> sampleMetadataFile: /private/var/folders/j1/hw6j14mj3wq381x7mynwfbp00000gn/T/RtmpIbOaaE/temp_libpathe6db58dc9be0/bcbioSingleCell/extdata/indrops/metadata.csv #> interestingGroups: sampleName #> filtered: FALSE #> class: SingleCellExperiment #> dim: 50 100 #> metadata(27): allSamples bcbioCommandsLog ... wd yaml #> assays(1): counts #> rownames(50): ENSG00000071082 ENSG00000100316 ... ENSG00000269028 #> ENSG00000282105 #> rowData names(0): #> colnames(100): AAACACTA_CTTCGATT AAACTACA_CCACATTA ... #> TGGGAATT_ATATAGGA TGTTATCA_ACGCAGAG #> colData names(15): aggregate description ... sampleName sequence #> reducedDimNames(0): #> spikeNames(0):