# Data Transformations

`data-transformations.Rd`

*collapse* provides an ensemble of functions to perform common data transformations efficiently and user friendly:

`dapply`

**applies functions to rows or columns**of matrices and data frames, preserving the data format.`BY`

is an S3 generic for efficient**Split-Apply-Combine computing**, similar to`dapply`

.A set of arithmetic operators facilitates

**row-wise**`%rr%`

,`%r+%`

,`%r-%`

,`%r*%`

,`%r/%`

and**column-wise**`%cr%`

,`%c+%`

,`%c-%`

,`%c*%`

,`%c/%`

**replacing and sweeping operations**involving a vector and a matrix or data frame / list. Since v1.7, the operators`%+=%`

,`%-=%`

,`%*=%`

and`%/=%`

do column- and element- wise math by reference, and the function`setop`

can also perform sweeping out rows by reference.`(set)TRA`

is a more advanced S3 generic to efficiently perform**(groupwise) replacing and sweeping out of statistics**, either by creating a copy of the data or by reference. Supported operations are:*Integer-id**String-id**Description*0 "na" or "replace_na" replace only missing values 1 "fill" or "replace_fill" replace everything 2 "replace" replace data but preserve missing values 3 "-" subtract 4 "-+" subtract group-statistics but add group-frequency weighted average of group statistics 5 "/" divide 6 "%" compute percentages 7 "+" add 8 "*" multiply 9 "%%" modulus 10 "-%%" subtract modulus All of

*collapse*'s Fast Statistical Functions have a built-in`TRA`

argument for faster access (i.e. you can compute (groupwise) statistics and use them to transform your data with a single function call).`fscale/STD`

is an S3 generic to perform (groupwise and / or weighted)**scaling / standardizing**of data and is orders of magnitude faster than`scale`

.`fwithin/W`

is an S3 generic to efficiently perform (groupwise and / or weighted)**within-transformations / demeaning / centering**of data. Similarly`fbetween/B`

computes (groupwise and / or weighted)**between-transformations / averages**(also a lot faster than`ave`

).`fhdwithin/HDW`

, shorthand for 'higher-dimensional within transform', is an S3 generic to efficiently**center data on multiple groups and partial-out linear models**(possibly involving many levels of fixed effects and interactions). In other words,`fhdwithin/HDW`

efficiently computes**residuals**from linear models. Similarly`fhdbetween/HDB`

, shorthand for 'higher-dimensional between transformation', computes the corresponding means or**fitted values**.`flag/L/F`

,`fdiff/D/Dlog`

and`fgrowth/G`

are S3 generics to compute sequences of**lags / leads**and suitably lagged and iterated (quasi-, log-)**differences**and**growth rates**on time series and panel data.`fcumsum`

flexibly computes (grouped, ordered) cumulative sums. More in Time Series and Panel Series.`STD, W, B, HDW, HDB, L, D, Dlog`

and`G`

are parsimonious wrappers around the`f-`

functions above representing the corresponding transformation 'operators'. They have additional capabilities when applied to data-frames (i.e. variable selection, formula input, auto-renaming and id-variable preservation), and are easier to employ in regression formulas, but are otherwise identical in functionality.

## Table of Functions

Function / S3 Generic | Methods | Description | ||

`dapply` | No methods, works with matrices and data frames | Apply functions to rows or columns | ||

`BY` | `default, matrix, data.frame, grouped_df` | Split-Apply-Combine computing | ||

`%(r/c)(r/+/-/*//)%` | No methods, works with matrices and data frames / lists | Row- and column-arithmetic | ||

`(set)TRA` | `default, matrix, data.frame, grouped_df` | Replace and sweep out statistics (by reference) | ||

`fscale/STD` | `default, matrix, data.frame, pseries, pdata.frame, grouped_df` | Scale / standardize data | ||

`fwithin/W` | `default, matrix, data.frame, pseries, pdata.frame, grouped_df` | Demean / center data | ||

`fbetween/B` | `default, matrix, data.frame, pseries, pdata.frame, grouped_df` | Compute means / average data | ||

`fhdwithin/HDW` | `default, matrix, data.frame, pseries, pdata.frame` | High-dimensional centering and lm residuals | ||

`fhdbetween/HDB` | `default, matrix, data.frame, pseries, pdata.frame` | High-dimensional averages and lm fitted values | ||

`flag/L/F` , `fdiff/D/Dlog` , `fgrowth/G` , `fcumsum` | `default, matrix, data.frame, pseries, pdata.frame, grouped_df` | (Sequences of) lags / leads, differences, growth rates and cumulative sums |