collapse Package Options
collapse-options.Rd
collapse is globally configurable to an extent few packages are: the default value of key function arguments governing the behavior of its algorithms, and the exported namespace, can be adjusted interactively through the set_collapse()
function.
These options are saved in an internal environment called .op
(for safety and performance reasons) visible in the documentation of some functions such as fmean
. The contents of this environment can be accessed using get_collapse()
.
There are also a few options that can be set using options
(retrievable using getOption
). These options mainly affect package startup behavior.
Arguments
- ...
either comma separated options, or a single list of options. The available options are:
na.rm
logical, default TRUE
. Sets the default for statistical algorithms such as the Fast Statistical Functions to skip missing values. If your data does not have missing values, or only in rare cases, it is recommended to change this toFALSE
for performance gains. Note that this does not affect other (non-statistical) uses ofna.rm
arguments, such as inpivot
.sort
logical, default TRUE
. Sets the default for grouping operations to be sorted. This also applies to factor generation usingqF
and tabulation withqtab
, but excludes other uses ofsort
arguments where grouping is not the objective (such as infunique
orpivot
). In general, sorted grouping (internally usingradixorder
) is slower than hash-based direct grouping (internally usinggroup
). However, if data is pre-sorted, sorted grouping is slightly faster. In general, if records don't need to be sorted or you want to maintain their first-appearance order, changing this toFALSE
is recommended and often brings substantial performance gains. Note that this also affects internal grouping applied when atomic vectors (except for factors) or lists are passed tog
arguments in Fast Statistical Functions.nthreads
integer, default 1. Sets the default for OpenMP multithreading, available in certain statistical and data manipulation functions. Setting values greater than 1 is strongly recommended with larger datasets. stable.algo
logical, default TRUE
. Option passed tofvar()/fsd()
andqsu()
.FALSE
enables one-pass standard deviation calculation, which is very fast, but might incur catastrophic cancellation if numbers are large and the variance is small. seefvar
for details.stub
logical, default TRUE
. Controls whether transformation operators (.OPERATOR_FUN
) such asW
,L
,STD
etc. add prefixes to transformed columns of matrix and data.frame-like objects.verbose
integer, default 1
. Print additional (diagnostic) information or messages when executing code. Currently only used injoin
androworder
.digits
integer, default 2
. Number of digits to print, e.g. indescr
orpwcor
.mask
character, default NULL
. Allows masking existing base R/dplyr functions with faster collapse versions, by creating additional functions in the namespace and instantly exporting them:For example set_collapse(mask = "unique")
(or, equivalently,set_collapse(mask = "funique")
) will createunique <- funique
in the collapse namespace, exportunique()
, and silently detach and attach the namespace again so R can find it - all in millisecond. Thus callingunique()
afterwards uses the collapse version - which is many times faster.funique
remains available and you can still callbase::unique
explicitly.All collapse functions starting with 'f' can be passed to the option (with or without the 'f') e.g. set_collapse(mask = c("subset", "transform", "droplevels"))
createssubset <- fsubset
,transform <- ftransform
etc. Special functions are"n"
and"table"/"qtab"
, and"%in%"
, which createn <- GRPN
(for use in(f)summarise
/(f)mutate
),table <- qtab
, and replace%in%
with a fast version usingfmatch
, respectively.There are also a couple of convenience keywords that you can use to mask groups of functions: - "manip"
adds data manipulation functions:fsubset, ftransform, ftransform<-, ftransformv, fcompute, fcomputev, fselect, fselect<-, fgroup_by, fgroup_vars, fungroup, fsummarise, fsummarize, fmutate, frename, findex_by, findex
.- "helper"
adds the functions:fdroplevels
,finteraction
,fmatch
,funique
,fnunique
,fduplicated
,fcount
,fcountv
,fquantile
,frange
,fdist
,fnlevels
,fnrow
andfncol
.- "special"
exportsn()
,table()
and%in%
. See above.- "fast-fun"
adds the functions contained in the macro:.FAST_FUN
. See also Note.- "fast-stat-fun"
adds the functions contained in the macro:.FAST_STAT_FUN
. See also Note.- "fast-trfm-fun"
adds the functions contained in:setdiff(.FAST_FUN, .FAST_STAT_FUN)
. See also Note.- "all"
turns on all of the above.The re-attaching of the namespace places collapse at the top of the search path (after the global environment), implying that all its exported functions will take priority over other libraries. Users can use fastverse::fastverse_conflicts()
to check which functions are masked followingset_collapse(mask = ...)
. The option can be changed at any time with immediate effect. Usingset_collapse(mask = NULL)
removes all masked functions from the namespace, and can also be called simply to place collapse at the top of the search path.remove
character, default NULL
. Similar to 'mask': allows removing functions from the exported namespace (they are still in the namespace, just no longer exported). All collapse functions can be passed here. This argument is always evaluated after 'mask', thus you can also remove masked functions again i.e. after setting a keyword which masks a bunch of functions. There are also a couple of convenience keywords you can specify to bulk-remove certain functions:- "shorthand"
removes function shorthands:gv, gv<-, av, av<-, nv, nv<-, gvr, gvr<-, itn, ix, slt, slt<-, sbt, gby, iby, mtt, smr, tfm, tfmv, tfm<-, settfm, settfmv, rnm
.- "infix"
removes infix functions:%!=%, %[!]in%, %[!]iin%, %*=%, %+=%, %-=%, %/=%, %=%, %==%, %c*%, %c+%, %c-%, %c/%, %cr%, %r*%, %r+%, %r-%, %r/%, %rr%
.- "operator"
removes functions contained in the macro:.OPERATOR_FUN
.- "old"
removes depreciated functions contained in the macro:.COLLAPSE_OLD
.Like 'mask', the option is alterable and reversible. Specifying set_collapse(remove = NULL)
restores the exported namespace. Also like 'mask', this option silently detaches and attaches collapse again, ensuring that it is at the top of the search path.- opts
character. A vector of options to receive from
.op
, orNULL
for a list of all options.
Value
set_collapse()
returns the old content of .op
invisibly as a list. get_collapse()
, if called with only one option, returns the value of the option, and otherwise a list.
Note
Setting keywords "fast-fun", "fast-stat-fun", "fast-trfm-fun" or "all" with set_collapse(mask = ...)
will also adjust internal optimization flags, e.g. in (f)summarise
and (f)mutate
, so that these functions - and all expressions containing them - receive vectorized execution (see examples of (f)summarise
and (f)mutate
). Users should be aware of expressions like fmutate(mu = sum(var) / lenth(var))
: this usually gets executed by groups, but with these keywords set,this will be vectorized (like fmutate(mu = fsum(var) / lenth(var))
) implying grouped sum divided by overall length. In this case fmutate(mu = base::sum(var) / lenth(var))
needs to be specified to retain the original result.
Note that passing individual functions like set_collapse(mask = "(f)sum")
will not change internal optimization flags for these functions. This is to ensure consistency i.e. you can be either all in (by setting appropriate keywords) or all out when it comes to vectorized stats with basic R names.
Note also that masking does not change documentation links, so you need to look up the f- version of a function to get the right documentation.
A safe way to set options affecting startup behavior is by using a .Rprofile
file in your user or project directory (see also here, the user-level file is located at file.path(Sys.getenv("HOME"), ".Rprofile")
and can be edited using file.edit(Sys.getenv("HOME"), ".Rprofile")
), or by using a .fastverse
configuration file in the project directory.
options("collapse_remove")
does in fact remove functions from the namespace and cannot be reversed by set_collapse(remove = NULL)
once the package is loaded. It is only reversed by re-loading collapse.
Options Set Using options()
"collapse_unused_arg_action"
regulates how generic functions (such as the Fast Statistical Functions) in the package react when an unknown argument is passed to a method. The default action is"warning"
which issues a warning. Other options are"error"
,"message"
or"none"
, whereby the latter enables silent swallowing of such arguments."collapse_export_F"
, if set toTRUE
, exports the lead operatorF
in the package namespace when loading the package. The operator was exported by default until v1.9.0, but is now hidden inside the package due to too many problems withbase::F
. Alternatively, the operator can be accessed usingcollapse:::F
."collapse_nthreads"
,"collapse_na_rm"
,"collapse_sort"
,"collapse_stable_algo"
,"collapse_verbose"
,"collapse_digits"
,"collapse_mask"
and"collapse_remove"
can be set before loading the package to initialize.op
with different defaults (e.g. using an.Rprofile
file). Once loaded, these options have no effect, and users need to useset_collapse()
to change them. See also the Note.
Examples
# Setting new values
oldopts <- set_collapse(nthreads = 2, na.rm = FALSE)
# Getting the values
get_collapse()
#> $nthreads
#> [1] 2
#>
#> $remove
#> NULL
#>
#> $stable.algo
#> [1] TRUE
#>
#> $sort
#> [1] TRUE
#>
#> $digits
#> [1] 2
#>
#> $stub
#> [1] TRUE
#>
#> $verbose
#> [1] 1
#>
#> $mask
#> NULL
#>
#> $na.rm
#> [1] FALSE
#>
get_collapse("nthreads")
#> [1] 2
# Resetting
set_collapse(oldopts)
rm(oldopts)
if (FALSE) { # \dontrun{
## This is a typical working setup I use:
library(fastverse)
# Loading other stats packages with fastverse_extend():
# displays versions, checks conflicts, and installs if unavailable
fastverse_extend(qs, fixest, grf, glmnet, install = TRUE)
# Now setting collapse options with some namespace modification
set_collapse(
nthreads = 4,
sort = FALSE,
mask = c("manip", "helper", "special", "mean", "scale"),
remove = "old"
)
# Final conflicts check (optional)
fastverse_conflicts()
# For some simpler scripts I also use
set_collapse(
nthreads = 4,
sort = FALSE,
mask = "all",
remove = c("old", "between") # I use data.table::between > fbetween
)
# This is now collapse code
mtcars |>
subset(mpg > 12) |>
group_by(cyl) |>
sum()
} # }
## Changing what happens with unused arguments
oldopts <- options(collapse_unused_arg_action = "message") # default: "warning"
fmean(mtcars$mpg, bla = 1)
#> Unused argument (bla = 1) passed to fmean.default
#> [1] 20.09062
# Now nothing happens, same as base R
options(collapse_unused_arg_action = "none")
fmean(mtcars$mpg, bla = 1)
#> [1] 20.09062
mean(mtcars$mpg, bla = 1)
#> [1] 20.09062
options(oldopts)
rm(oldopts)