collapse provides an ensemble of functions to perform common data transformations efficiently and user friendly:
dapply applies functions to rows or columns of matrices and data frames, preserving the data format.
A set of arithmetic operators facilitates row-wise
%c/% replacing and sweeping operations involving a vector and a matrix or data frame / list. Since v1.7, the operators
%/=% do column- and element- wise math by reference, and the function
setop can also perform sweeping out rows by reference.
(set)TRA is a more advanced S3 generic to efficiently perform (groupwise) replacing and sweeping out of statistics, either by creating a copy of the data or by reference.
Supported operations are:
|0||"replace_NA"||replace missing values|
|2||"replace"||replace data but preserve missing values|
|4||"-+"||subtract group-statistics but add group-frequency weighted average of group statistics|
All of collapse's Fast Statistical Functions have a built-in
TRA argument for faster access (i.e. you can compute (groupwise) statistics and use them to transform your data with a single function call).
fwithin/W is an S3 generic to efficiently perform (groupwise and / or weighted) within-transformations / demeaning / centering of data. Similarly
fbetween/B computes (groupwise and / or weighted) between-transformations / averages (also a lot faster than
fhdwithin/HDW, shorthand for 'higher-dimensional within transform', is an S3 generic to efficiently center data on multiple groups and partial-out linear models (possibly involving many levels of fixed effects and interactions). In other words,
fhdwithin/HDW efficiently computes residuals from linear models. Similarly
fhdbetween/HDB, shorthand for 'higher-dimensional between transformation', computes the corresponding means or fitted values.
fgrowth/G are S3 generics to compute sequences of lags / leads and suitably lagged and iterated (quasi-, log-) differences and growth rates on time series and panel data.
fcumsum flexibly computes (grouped, ordered) cumulative sums. More in Time Series and Panel Series.
STD, W, B, HDW, HDB, L, D, Dlog and
G are parsimonious wrappers around the
f- functions above representing the corresponding transformation 'operators'. They have additional capabilities when applied to data-frames (i.e. variable selection, formula input, auto-renaming and id-variable preservation), and are easier to employ in regression formulas, but are otherwise identical in functionality.
|Function / S3 Generic||Methods||Description|
||No methods, works with matrices and data frames||Apply functions to rows or columns|
|No methods, works with matrices and data frames / lists||Row- and column-arithmetic|
|Replace and sweep out statistics (by reference)|
|Scale / standardize data|
|Demean / center data|
|Compute means / average data|
|High-dimensional centering and lm residuals|
|High-dimensional averages and lm fitted values|
||(Sequences of) lags / leads, differences, growth rates and cumulative sums|