bcbioSingleCell is an S4 class that extends SingleCellExperiment, and is designed to store a bcbio single-cell RNA-seq analysis. This class contains read counts saved as a sparse matrix (sparseMatrix), sample metadata, and cell quality control metrics.

bcbioSingleCell(
  uploadDir,
  sampleMetadataFile = NULL,
  organism = NULL,
  ensemblRelease = NULL,
  genomeBuild = NULL,
  gffFile = NULL,
  transgeneNames = NULL,
  spikeNames = NULL,
  interestingGroups = "sampleName",
  BPPARAM = BiocParallel::SerialParam()
)

Arguments

uploadDir

character(1). Final upload directory path.

sampleMetadataFile

character(1). Sample metadata file path. CSV or TSV is preferred, but Excel worksheets are also supported. Check the documentation for conventions and required columns.

organism

character(1). Full Latin organism name (e.g. "Homo sapiens").

ensemblRelease

integer(1). Ensembl release version (e.g. 90). We recommend setting this value if possible, for improved reproducibility. When left unset, the latest release available via AnnotationHub/ensembldb is used. Note that the latest version available can vary, depending on the versions of AnnotationHub and ensembldb in use.

genomeBuild

character(1). Ensembl genome build assembly name (e.g. "GRCh38"). If set NULL, defaults to the most recent build available. Note: don't pass in UCSC build IDs (e.g. "hg38").

gffFile

character(1). GFF/GTF (General Feature Format) file. Generally, we recommend using GTF (GFFv2) instead of GFFv3.

transgeneNames

character. Vector indicating which assay rows denote transgenes (e.g. EGFP, TDTOMATO).

spikeNames

character. Vector indicating which assay rows denote spike-in sequences (e.g. ERCCs).

interestingGroups

character. Groups of interest to use for visualization. Corresponds to factors describing the columns of the object.

BPPARAM

An optional BiocParallelParam instance determining the parallel back-end to be used during evaluation, or a list of BiocParallelParam instances, to be applied in sequence for nested calls to BiocParallel functions.

Value

bcbioSingleCell.

Note

Updated 2020-01-20.

Remote data

When working in RStudio, we recommend connecting to the bcbio-nextgen run directory as a remote connection over sshfs.

See also

Examples

uploadDir <- system.file("extdata/indrops", package = "bcbioSingleCell") x <- bcbioSingleCell(uploadDir)
#>
#> ── Importing bcbio-nextgen single-cell RNA-seq run ─────────────────────────────
#> projectDir: 2018-01-01_bcbio
#> 1 sample detected:
#> ● multiplexed_AAAAAAAA
#> → Importing project-summary.yaml using yaml::`yaml.load_file()`.
#> ! Data versions are missing.
#> → Importing programs.txt using data.table::`fread()`.
#> → Importing bcbio-nextgen.log using base::`readLines()`.
#> ! bcbio-nextgen.log file is empty.
#> → Importing bcbio-nextgen-commands.log using base::`readLines()`.
#> 1000 reads per cellular barcode cutoff detected.
#> Counts will imported as genes.
#> UMI type: harvard-indrop-v3
#>
#> ── Sample metadata ──
#>
#> ── Counts ──
#>
#> → Importing counts.
#> → Importing multiplexed-AAAAAAAA.mtx.rownames using base::`readLines()`.
#> → Importing multiplexed-AAAAAAAA.mtx.colnames using base::`readLines()`.
#> → Importing multiplexed-AAAAAAAA.mtx using Matrix::`readMM()`.
#> → Importing sidecar multiplexed-AAAAAAAA.mtx.rownames.
#> → Importing multiplexed-AAAAAAAA.mtx.rownames using base::`readLines()`.
#> → Importing sidecar multiplexed-AAAAAAAA.mtx.colnames.
#> → Importing multiplexed-AAAAAAAA.mtx.colnames using base::`readLines()`.
#> ! multiplexed-AAAAAAAA.mtx does not contain syntactically valid names.
#> → Imported multiplexed-AAAAAAAA.
#>
#> ── Feature metadata ──
#>
#> bcbio GTF file: #> /n/app/bcbio/dev/genomes/Hsapiens/hg38/rnaseq/ref-transcripts.gtf
#> ! bcbio GTF file is not accessible.
#> ! Slotting empty ranges into `rowRanges()`.
#>
#> ── Column data ──
#>
#> ! `sampleMetadataFile` is recommended for multiplexed samples (e.g. harvard-indrop-v3).
#>
#> ── Metadata ──
#>
#> → Importing unfiltered cellular barcode distributions.
#> → Importing multiplexed-AAAAAAAA-barcodes.tsv using data.table::`fread()`.
#> → Calculating 100 sample metrics.
#> ! Calculating metrics without biotype information. #> `rowRanges()` required to calculate: `nCoding, nMito, mitoRatio`.
#>
#> bcbio single-cell RNA-seq run imported successfully.
#> bcbioSingleCell 0.4.8 #> uploadDir: /private/var/folders/j1/hw6j14mj3wq381x7mynwfbp00000gn/T/RtmpqwWEfp/temp_libpath12dc8264489a3/bcbioSingleCell/extdata/indrops #> dates(2): [bcbio] 2018-01-01; [R] 2020-02-19 #> level: genes #> interestingGroups: sampleName #> filtered: FALSE #> class: SingleCellExperiment #> dim: 50 100 #> metadata(27): allSamples bcbioCommandsLog ... wd yaml #> assays(1): counts #> rownames(50): ENSG00000071082 ENSG00000100316 ... ENSG00000269028 #> ENSG00000282105 #> rowData names(0): #> colnames(100): AAACACTA_CTTCGATT AAACTACA_CCACATTA ... #> TGGGAATT_ATATAGGA TGTTATCA_ACGCAGAG #> colData names(9): log10FeaturesPerCount mitoRatio ... sampleID #> sampleName #> reducedDimNames(0): #> spikeNames(0): #> altExpNames(0):
x <- bcbioSingleCell( uploadDir = uploadDir, sampleMetadataFile = file.path(uploadDir, "metadata.csv") )
#>
#> ── Importing bcbio-nextgen single-cell RNA-seq run ─────────────────────────────
#> projectDir: 2018-01-01_bcbio
#> 1 sample detected:
#> ● multiplexed_AAAAAAAA
#> → Importing project-summary.yaml using yaml::`yaml.load_file()`.
#> ! Data versions are missing.
#> → Importing programs.txt using data.table::`fread()`.
#> → Importing bcbio-nextgen.log using base::`readLines()`.
#> ! bcbio-nextgen.log file is empty.
#> → Importing bcbio-nextgen-commands.log using base::`readLines()`.
#> 1000 reads per cellular barcode cutoff detected.
#> Counts will imported as genes.
#> UMI type: harvard-indrop-v3
#>
#> ── Sample metadata ──
#>
#> → Importing metadata.csv using data.table::`fread()`.
#> Multiplexed samples detected.
#>
#> ── Counts ──
#>
#> → Importing counts.
#> → Importing multiplexed-AAAAAAAA.mtx.rownames using base::`readLines()`.
#> → Importing multiplexed-AAAAAAAA.mtx.colnames using base::`readLines()`.
#> → Importing multiplexed-AAAAAAAA.mtx using Matrix::`readMM()`.
#> → Importing sidecar multiplexed-AAAAAAAA.mtx.rownames.
#> → Importing multiplexed-AAAAAAAA.mtx.rownames using base::`readLines()`.
#> → Importing sidecar multiplexed-AAAAAAAA.mtx.colnames.
#> → Importing multiplexed-AAAAAAAA.mtx.colnames using base::`readLines()`.
#> ! multiplexed-AAAAAAAA.mtx does not contain syntactically valid names.
#> → Imported multiplexed-AAAAAAAA.
#>
#> ── Feature metadata ──
#>
#> bcbio GTF file: #> /n/app/bcbio/dev/genomes/Hsapiens/hg38/rnaseq/ref-transcripts.gtf
#> ! bcbio GTF file is not accessible.
#> ! Slotting empty ranges into `rowRanges()`.
#>
#> ── Column data ──
#>
#> ── Metadata ──
#>
#> → Importing unfiltered cellular barcode distributions.
#> → Importing multiplexed-AAAAAAAA-barcodes.tsv using data.table::`fread()`.
#> → Calculating 100 sample metrics.
#> ! Calculating metrics without biotype information. #> `rowRanges()` required to calculate: `nCoding, nMito, mitoRatio`.
#>
#> bcbio single-cell RNA-seq run imported successfully.
#> bcbioSingleCell 0.4.8 #> uploadDir: /private/var/folders/j1/hw6j14mj3wq381x7mynwfbp00000gn/T/RtmpqwWEfp/temp_libpath12dc8264489a3/bcbioSingleCell/extdata/indrops #> dates(2): [bcbio] 2018-01-01; [R] 2020-02-19 #> level: genes #> sampleMetadataFile: /private/var/folders/j1/hw6j14mj3wq381x7mynwfbp00000gn/T/RtmpqwWEfp/temp_libpath12dc8264489a3/bcbioSingleCell/extdata/indrops/metadata.csv #> interestingGroups: sampleName #> filtered: FALSE #> class: SingleCellExperiment #> dim: 50 100 #> metadata(27): allSamples bcbioCommandsLog ... wd yaml #> assays(1): counts #> rownames(50): ENSG00000071082 ENSG00000100316 ... ENSG00000269028 #> ENSG00000282105 #> rowData names(0): #> colnames(100): AAACACTA_CTTCGATT AAACTACA_CCACATTA ... #> TGGGAATT_ATATAGGA TGTTATCA_ACGCAGAG #> colData names(15): aggregate description ... sampleName sequence #> reducedDimNames(0): #> spikeNames(0): #> altExpNames(0):