RealizationSink {DelayedArray} | R Documentation |
Get or set the realization backend for the current session with
getRealizationBackend
or setRealizationBackend
.
Advanced users: Create a RealizationSink object with the backend-agnostic
RealizationSink()
constructor. Use this object to write an
array-like object block by block to disk (or to memory).
supportedRealizationBackends() getRealizationBackend() setRealizationBackend(BACKEND=NULL) RealizationSink(dim, dimnames=NULL, type="double")
BACKEND |
|
dim |
The dimensions (specified as an integer vector) of the RealizationSink object to create. |
dimnames |
The dimnames (specified as a list of character vectors or NULLs) of the RealizationSink object to create. |
type |
The type of the data that will be written to the RealizationSink object to create. |
The realization backend controls where/how realization happens e.g.
as an ordinary array if set to NULL
, as an RleArray object
if set to "RleArray"
, or as an HDF5Array object
if set to "HDF5Array"
.
supportedRealizationBackends
: A data frame with 1 row per
supported realization backend.
getRealizationBackend
: NULL
or a single string specifying
the name of the realization backend currently in use.
RealizationSink
: A RealizationSink object for the current
realization backend.
blockGrid
to define grids to use in the context
of block processing of array-like objects.
DelayedArray objects.
RleArray objects.
HDF5Array objects in the HDF5Array package.
HDF5-dump-management in the HDF5Array package to control the location and physical properties of automatically created HDF5 datasets.
array objects in base R.
## --------------------------------------------------------------------- ## A. supportedRealizationBackends() AND FAMILY ## --------------------------------------------------------------------- supportedRealizationBackends() getRealizationBackend() # backend is set to NULL setRealizationBackend("HDF5Array") supportedRealizationBackends() getRealizationBackend() # backend is set to "HDF5Array" ## --------------------------------------------------------------------- ## B. A SIMPLE (AND VERY ARTIFICIAL) RealizationSink() EXAMPLE ## --------------------------------------------------------------------- getHDF5DumpChunkLength() setHDF5DumpChunkLength(500L) getHDF5DumpChunkShape() sink <- RealizationSink(c(120L, 50L)) dim(sink) chunkdim(sink) grid <- blockGrid(sink, block.length=600) for (bid in seq_along(grid)) { viewport <- grid[[bid]] block <- 101 * bid + runif(length(viewport)) dim(block) <- dim(viewport) write_block(sink, viewport, block) } ## Always close the RealizationSink object when you are done writing to ## it and before coercing it to DelayedArray: close(sink) A <- as(sink, "DelayedArray") A ## --------------------------------------------------------------------- ## C. AN ADVANCED EXAMPLE OF USER-IMPLEMENTED BLOCK PROCESSING USING ## colGrid() AND A REALIZATION SINK ## --------------------------------------------------------------------- ## Say we have 2 matrices with the same number of columns. Each column ## represents a biological sample: library(HDF5Array) R <- as(matrix(runif(75000), ncol=1000), "HDF5Array") # 75 rows G <- as(matrix(runif(250000), ncol=1000), "HDF5Array") # 250 rows ## Say we want to compute the matrix U obtained by applying the same ## binary functions FUN() to all samples i.e. U is defined as: ## ## U[ , j] <- FUN(R[ , j], G[ , j]) for 1 <= j <= 1000 ## ## Note that FUN() should return a vector of constant length, say 200, ## so U will be a 200x1000 matrix. A naive implementation would be: ## ## pFUN <- function(r, g) { ## stopifnot(ncol(r) == ncol(g)) # sanity check ## sapply(seq_len(ncol(r)), function(j) FUN(r[ , j], g[ , j])) ## } ## ## But because U is going to be too big to fit in memory, we can't ## just do pFUN(R, G). So we want to compute U block by block and ## write the blocks to disk as we go. The blocks will be made of full ## columns. Also since we need to walk on 2 matrices at the same time ## (R and G), we can't use blockApply() or blockReduce() so we'll use ## a "for" loop. ## Before we get to the "for" loop, we need 4 things: ## 1) Two grids of blocks, one on R and one on G. The blocks in the ## 2 grids must contain the same number of columns. We arbitrarily ## choose to use blocks of 150 columns: R_grid <- colGrid(R, ncol=150) G_grid <- colGrid(G, ncol=150) ## 2) The function pFUN(). It will take 2 blocks as input, 1 from R ## and 1 from G, apply FUN() to all the samples in the blocks, ## and return a matrix with one columns per sample: pFUN <- function(r, g) { stopifnot(ncol(r) == ncol(g)) # sanity check ## Return a matrix with 200 rows with random values. Completely ## artificial sorry. A realistic example would actually need to ## apply the same binary function to r[ ,j] and g[ , j] for ## 1 <= j <= ncol(r). matrix(runif(200 * ncol(r)), nrow=200) } ## 3) A RealizationSink object where to write the matrices returned ## by pFUN() as we go. Note that instead of creating a realization ## sink by calling a backend-specific sink constructor (e.g. ## HDF5Array:::HDF5RealizationSink), we use the backend-agnostic ## constructor RealizationSink() and set the current realization ## backend to HDF5: setRealizationBackend("HDF5Array") U_sink <- RealizationSink(c(200L, 1000L)) ## 4) Finally, we create a grid on U_sink with blocks that contain the ## same number of columns as the corresponding blocks in R and G: U_grid <- colGrid(U_sink, ncol=150) ## Note that the 3 grids should have the same number of blocks: stopifnot(length(U_grid) == length(R_grid)) stopifnot(length(U_grid) == length(G_grid)) ## Now we can procede. We use a "for" loop where we walk on R and G at ## the same time, block by block, apply pFUN(), and write the output ## of pFUN() to U_sink: for (bid in seq_along(U_grid)) { R_block <- read_block(R, R_grid[[bid]]) G_block <- read_block(G, G_grid[[bid]]) U_block <- pFUN(R_block, G_block) write_block(U_sink, U_grid[[bid]], U_block) } close(U_sink) U <- as(U_sink, "DelayedArray") ## A note about parallelization: even though concurrent block reading ## from the same object is supported, concurrent writing to a sink is ## not supported yet. So the above code cannot be parallelized at the ## moment.