v0.2 | [Documentation]
An R interface for Lazy Cartesian Products. Provides memory-efficient ways to obtain elements (including random samples) from Cartesian Products. Partial wrapper of the lazy-cartesian-product
C++ library (Burdsall, 2018).
Installation
install.packages("remotes") # if not installed
remotes::install_github("arcruz0/rlazycartesian")
Example
Suppose you need a random sample of 1,000 elements from a Cartesian Product of 125M elements. rlazycartesian
can do this without generating all elements in advance (as expand.grid()
would do). This makes it so that the operation is fast and consumes very little RAM—see Benchmarks below.
library(rlazycartesian)
l <- list(x = paste0("x", 1:500, sep = ""),
y = paste0("y", 1:500, sep = ""),
z = paste0("z", 1:500, sep = ""))
lc <- lazy_cartesian(l)
lc
#> `lazy_cartesian` object with 1.25e+08 elements
set.seed(1)
head(get_random_sample(lc, 1000))
#> .element x y z
#> 1 66608964 x267 y218 z464
#> 2 44492929 x178 y486 z429
#> 3 60941821 x244 y384 z321
#> 4 30845227 x124 y191 z227
#> 5 17633234 x71 y267 z234
#> 6 79310131 x318 y121 z131
A note on element indices and expand.grid()
Element indices from rlazycartesian
are equivalent to indices from a sorted expand.grid()
:
get_elements(lc, c(12345, 98765))
#> .element x y z
#> 1 12345 x1 y25 z345
#> 2 98765 x1 y198 z265
eg <- expand.grid(l)
eg <- eg[order(eg$x, eg$y, eg$z),]
rownames(eg) <- NULL
eg[c(12345, 98765),]
#> x y z
#> 12345 x1 y25 z345
#> 98765 x1 y198 z265
Benchmarks
For the random sample example above, compare performance (Elapsed_Time_sec
) and peak RAM usage (Peak_RAM_Used_MiB
):
peakRAM::peakRAM(
{lc <- lazy_cartesian(l); get_random_sample(lc, 1000)},
{eg <- expand.grid(l); eg[sample(1:nrow(eg), 1000),]}
)
#> Function_Call Elapsed_Time_sec
#> 1 {lc<-lazy_cartesian(l)get_random_sample(lc,1000)} 0.002
#> 2 {eg<-expand.grid(l)eg[sample(1:nrow(eg),1000),]} 4.332
#> Total_RAM_Used_MiB Peak_RAM_Used_MiB
#> 1 0 0.2
#> 2 0 2862.1
Restrictions to the Cartesian product
Starting in v0.2, rlazycartesian
supports simple restrictions to the Cartesian product. For example, suppose that in the following Cartesian product we want to exclude any elements with “Red” and “Circle,” as well as any elements with “Square” and the numbers 1 or 3:
l <- list(color = c("Red", "Blue", "Yellow"),
shape = c("Square", "Circle"),
number = 1:3)
r <- list(
restriction1 = list(color = "Red", shape = "Circle"),
restriction2 = list(shape = "Square", number = c(1, 3))
)
lc <- lazy_cartesian(l, restrictions = r)
lc
#> `lazy_cartesian` object with 9 elements (after restrictions)
#> => 2 restrictions excluded 50% of the Cartesian Product
#> (which originally had 18 elements)
get_random_sample(lc, 9) # (note element indices still go up to 18)
#> .element color shape number
#> 1 17 Yellow Circle 2
#> 2 16 Yellow Circle 1
#> 3 8 Blue Square 2
#> 4 18 Yellow Circle 3
#> 5 10 Blue Circle 1
#> 6 12 Blue Circle 3
#> 7 14 Yellow Square 2
#> 8 11 Blue Circle 2
#> 9 2 Red Square 2
References
Burdsall, T. (2018). lazy-cartesian-product
: .hpp library to efficiently generate combinations using the Lazy Cartesian Product algorithm. https://github.com/tylerburdsall/lazy-cartesian-product