collapse provides an ensemble of functions to perform common data transformations efficiently and user friendly:
dapply applies functions to rows or columns of matrices and data frames, preserving the data format.
A set of arithmetic operators facilitates row-wise
%c/% replacing and sweeping operations involving a vector and a matrix or data frame / list.
TRA is a more advanced S3 generic to efficiently perform (groupwise) replacing and sweeping out of statistics.
Supported operations are:
|1||"replace_fill"||replace and overwrite missing values|
|2||"replace"||replace but preserve missing values|
|4||"-+"||subtract group-statistics but add group-frequency weighted average of group statistics|
All of collapse's Fast Statistical Functions have a built-in
TRA argument for faster access (i.e. you can compute (groupwise) statistics and use them to transform your data with a single function call).
fwithin/W is an S3 generic to efficiently perform (groupwise and / or weighted) within-transformations / demeaning / centering of data. Similarly
fbetween/B computes (groupwise and / or weighted) between-transformations / averages (also a lot faster than
fHDwithin/HDW, shorthand for 'higher-dimensional within transform', is an S3 generic to efficiently center data on multiple groups and partial-out linear models (possibly involving many levels of fixed effects). In other words,
fHDwithin/HDW efficiently computes residuals from (potentially complex) linear models. Similarly
fHDbetween/HDB, shorthand for 'higher-dimensional between transformation', computes the corresponding means or fitted values.
fgrowth/G are S3 generics to compute sequences of lags / leads and suitably lagged and iterated (quasi-, log-) differences and growth rates on time series and panel data. More in Time Series and Panel Series.
STD, W, B, HDW, HDB, L, D, Dlog and
G are parsimonious wrappers around the
f- functions above representing the corresponding transformation 'operators'. They have additional capabilities when applied to data-frames (i.e. variable selection, formula input, auto-renaming and id-variable preservation), and are easier to employ in regression formulas, but are otherwise identical in functionality.
|Function / S3 Generic||Methods||Description|
||No methods, works with matrices and data frames||Apply functions to rows or columns|
|No methods, works with matrices and data frames / lists||Row- and column-arithmetic|
|Replace and sweep out statistics|
|Scale / standardize data|
|Demean / center data|
|Compute means / average data|
|High-dimensional centering and lm residuals|
|High-dimensional averages and lm fitted values|
||(Sequences of) lags / leads|
|(Sequences of lagged/leaded and iterated quasi- log-) differences|
|(Sequences of lagged/leaded and iterated) growth rates|