# Collapse Documentation & Overview

`collapse-documentation.Rd`

The following table fully summarizes the contents of *collapse*. The documentation is structured hierarchically: This is the main overview page, linking to topical overview pages and associated function pages (unless functions are documented on the topic page).

## Topics and Functions

Topic | Main Features / Keywords | Functions | ||

Fast Statistical Functions | Fast (grouped and weighted) statistical functions for vector, matrix, data frame and grouped data frames (class 'grouped_df', dplyr compatible). | `fsum` , `fprod` , `fmean` , `fmedian` , `fmode` , `fvar` , `fsd` , `fmin` , `fmax` , `fnth` , `ffirst` , `flast` , `fnobs` , `fndistinct` | ||

Fast Grouping and Ordering | Fast (ordered) groupings from vectors, data frames, lists. 'GRP' objects are efficient inputs for programming with collapse's fast functions. `fgroup_by` can attach them to a data frame, for fast dplyr-style grouped computations. Fast splitting of vectors based on 'GRP' objects. Fast radix-based ordering and hash-based grouping (the workhorses behind `GRP` ). Fast matching (rows) and unique values/rows, group counts, factor generation, vector grouping, interactions, dropping unused factor levels, generalized run-length type grouping and grouping of integer sequences and time vectors. | `GRP` , `as_factor_GRP` , `GRPN` , `GRPid` , `GRPnames` , `is_GRP` , `fgroup_by` , `group_by_vars` , `fgroup_vars` , `fungroup` , `gsplit` , `greorder` , `radixorder(v)` , `group` , `fmatch` , `ckmatch` , `%!in%` , `%[!]iin%` , `funique` , `fnunique` , `fduplicated` , `any_duplicated` , `fcount(v)` , `qF` , `qG` , `is_qG` , `finteraction` , `fdroplevels` , `groupid` , `seqid` , `timeid` | ||

Fast Data Manipulation | Fast and flexible select, subset, summarise, mutate/transform, sort/reorder, combine, join, reshape, rename and relabel data. Some functions modify by reference and/or allow assignment. In addition a set of (standard evaluation) functions for fast selecting, replacing or adding data frame columns, including shortcuts to select and replace variables by data type. | `fselect(<-)` , `fsubset/ss` , `fsummarise` , `fmutate` , `across` , `(f/set)transform(v)(<-)` , `fcompute(v)` , `roworder(v)` , `colorder(v)` , `rowbind` , `join` , `pivot` , `(f/set)rename` , `(set)relabel` , `get_vars(<-)` , `add_vars(<-)` , `num_vars(<-)` , `cat_vars(<-)` , `char_vars(<-)` , `fact_vars(<-)` , `logi_vars(<-)` , `date_vars(<-)` | ||

Quick Data Conversion | Quick conversions: data.frame <> data.table <> tibble <> matrix (row- or column-wise) <> list | array > matrix, data.frame, data.table, tibble | vector > factor, matrix, data.frame, data.table, tibble; and converting factors / all factor columns. | `qDF` , `qDT` , `qTBL` , `qM` , `qF` , `mrtl` , `mctl` , `as_numeric_factor` , `as_integer_factor` , `as_character_factor` | ||

Advanced Data Aggregation | Fast and easy (weighted and parallelized) aggregation of multi-type data, with different functions applied to numeric and categorical variables. Custom specifications allow mappings of functions to variables + renaming. | `collap(v/g)` | ||

Data Transformations | Fast row- and column- arithmetic and (object preserving) apply functionality for vectors, matrices and data frames. Fast (grouped) replacing and sweeping of statistics (by reference) and (grouped and weighted) scaling / standardizing, (higher-dimensional) between- and within-transformations (i.e. averaging and centering), linear prediction and partialling out. | `%(r/c)r%` , `%(r/c)(+/-/*//)%` , `dapply` , `BY` , `(set)TRA` , `fscale/STD` , `fbetween/B` , `fwithin/W` , `fhdbetween/HDB` , `fhdwithin/HDW` | ||

Linear Models | Fast (weighted) linear model fitting with 6 different solvers and a fast F-test to test exclusion restrictions on linear models with (large) factors. | `flm` , `fFtest` | ||

Time Series and Panel Series | Fast and class-agnostic indexed time series and panel data objects, check for irregularity in time series and panels, and efficient time-sequence to integer/factor conversion. Fast (sequences of) lags / leads and (lagged / leaded and iterated, quasi-, log-) differences, and (compounded) growth rates on (irregular) time series and panel data. Flexible cumulative sums. Panel data to array conversions. Multivariate panel- auto-, partial- and cross-correlation functions. | `findex_by` , `findex` , `unindex` , `reindex` , `is_irregular` , `to_plm` , `timeid` ,
`flag/L/F` , `fdiff/D/Dlog` , `fgrowth/G` , `fcumsum` , `psmat` , `psacf` , `pspacf` , `psccf` | ||

Summary Statistics | Fast (grouped and weighted) summary statistics for cross-sectional and panel data. Fast (weighted) cross tabulation. Efficient detailed description of data frame. Fast check of variation in data (within groups / dimensions). (Weighted) pairwise correlations and covariances (with obs. and p-value), pairwise observation count. | `qsu` , `qtab` , `descr` , `varying` , `pwcor` , `pwcov` , `pwnobs` | ||

Other Statistical | Fast euclidean distance computations, (weighted) sample quantiles, and range of vector. | `fdist` , `fquantile` , `frange` | ||

List Processing | (Recursive) list search and checks, extraction of list-elements / list-subsetting, fast (recursive) splitting, list-transpose, apply functions to lists of data frames / data objects, and generalized recursive row-binding / unlisting in 2-dimensions / to data frame. | `is_unlistable` , `ldepth` , `has_elem` , `get_elem` , `atomic_elem(<-)` , `list_elem(<-)` , `reg_elem` , `irreg_elem` , `rsplit` , `t_list` , `rapply2d` , `unlist2d` , `rowbind` | ||

Recode and Replace Values | Recode multiple values (exact or regex matching) and replace `NaN/Inf/-Inf` and outliers (according to 1- or 2-sided threshold or standard-deviations) in vectors, matrices or data frames. Insert a value at arbitrary positions into vectors, matrices or data frames. | `recode_num` , `recode_char` , `replace_na` , `replace_inf` , `replace_outliers` , `pad` | ||

(Memory) Efficient Programming | Efficient comparisons of a vector/matrix with a value, and replacing values/rows in vector/matrix/DF (avoiding logical vectors or subsets), faster generation of initialized vectors, and fast mathematical operations on vectors/matrices/DF's with no copies at all. Fast missing value detection, (random) insertion and removal/replacement, lengths and C storage types, greatest common divisor of vector, `nlevels` for factors, `nrow` , `ncol` , `dim` (for data frames) and `seq_along` rows or columns. Fast vectorization of matrices and lists, and choleski inverse of symmetric PD matrix. | `anyv` , `allv` , `allNA` , `whichv` , `whichNA` , `%==%` ,
`%!=%` , `copyv` , `setv` , `alloc` , `setop` , `%+=%` , `%-=%` , `%*=%` , `%/=%` , `missing_cases` , `na_insert` , `na_rm` , `na_locf` , `na_focb` , `na_omit` , `vlengths` , `vtypes` , `vgcd` , `fnlevels` , `fnrow` , `fncol` , `fdim` , `seq_row` , `seq_col` , `vec` , `cinv` | ||

Small (Helper) Functions | Multiple-assignment, non-standard concatenation, set and extract variable labels and classes, display variable names and labels together, add / remove prefix or postfix to / from column names, check exact or near / numeric equality of multiple objects or of all elements in a list, get names of functions called in an expression, return object with dimnames, row- or colnames efficiently set, or with all attributes removed, C-level functions to set and shallow-copy attributes, identify categorical (non-numeric) and date(-time) objects. | `massign` , `%=%` , `.c` , `vlabels(<-)` , `setLabels` , `vclasses` , `namlab` , `add_stub` , `rm_stub` , `all_identical` , `all_obj_equal` , `all_funs` , `setDimnames` , `setRownames` , `setColnames` , `unattrib` , `setAttrib` , `setattrib` , `copyAttrib` , `copyMostAttrib` , `is_categorical` , `is_date` | ||

Data and Global Macros | Groningen Growth and Development Centre 10-Sector Database, World Bank World Development dataset, and some global macros containing links to the topical documentation pages (including this page), all exported objects (excluding exported S3 methods and depreciated functions), all generic functions (excluding depreciated), the 2 datasets, depreciated functions, all fast functions, all fast statistical (scalar-valued) functions, and all transformation operators (these are not infix functions but function shortcuts resembling operators in a statistical sense, such as the lag/lead operators `L` /`F` , both wrapping `flag` , see `.OPERATOR_FUN` ). | `GGDC10S, wlddev, .COLLAPSE_TOPICS, .COLLAPSE_ALL, .COLLAPSE_GENERIC, .COLLAPSE_DATA, .COLLAPSE_OLD, .FAST_FUN, .FAST_STAT_FUN, .OPERATOR_FUN` | ||

Package Options | `set_collapse` /`get_collapse` can be used to globally set/get the defaults for `na.rm` , `nthreads` and `sort` , etc., arguments found in many functions, and to globally control the namespace with options 'mask' and 'remove': 'mask' can be used to mask base R/dplyr functions by export copies of equivalent collapse functions starting with `"f"` , removing the leading `"f"` (e.g. exporting `subset <- fsubset` ). 'remove' allows removing arbitrary functions from the exported namespace. `options("collapse_unused_arg_action")` sets the action taken by generic statistical functions when unknown arguments are passed to a method. The default is `"warning"` . | `set_collapse` , `get_collapse` |

## Details

The added top-level documentation infrastructure in *collapse* allows you to effectively navigate the package.
Calling `?FUN`

brings up the documentation page documenting the function, which contains links to associated topic pages and closely related functions. You can also call topical documentation pages directly from the console. The links to these pages are contained in the global macro `.COLLAPSE_TOPICS`

(e.g. calling `help(.COLLAPSE_TOPICS[1])`

brings up this page).

## Author

**Maintainer**: Sebastian Krantz sebastian.krantz@graduateinstitute.ch