# Fast Grouping and Ordering

`fast-grouping-ordering.Rd`

*collapse* provides the following functions to efficiently group and order data:

`radixorder`

, provides fast radix-ordering through direct access to the method`order(..., method = "radix")`

, as well as the possibility to return some attributes very useful for grouping data and finding unique elements.`radixorderv`

exists as a programmers alternative. The function`roworder(v)`

efficiently reorders a data frame based on an ordering computed by`radixorderv`

.`group`

provides fast grouping in first-appearance order of rows, based on a hashing algorithm in C. Objects have class 'qG', see below.`GRP`

creates*collapse*grouping objects of class 'GRP' based on`radixorderv`

or`group`

. 'GRP' objects form the central building block for grouped operations and programming in*collapse*and are very efficient inputs to all*collapse*functions supporting grouped operations.`fgroup_by`

provides a fast replacement for`dplyr::group_by`

, creating a grouped data frame (or data.table / tibble etc.) with a 'GRP' object attached. This grouped frame can be used for grouped operations using*collapse*'s fast functions.`fmatch`

is a fast alternative to`match`

, which also supports matching of data frame rows.`funique`

is a faster version of`unique`

. The data frame method also allows selecting unique rows according to a subset of the columns.`fnunique`

efficiently calculates the number of unique values/rows.`fduplicated`

is a fast alternative to`duplicated`

.`any_duplicated`

is a simpler and faster alternative to`anyDuplicated`

.`fcount`

computes group counts based on a subset of columns in the data, and is a fast replacement for`dplyr::count`

.`fcountv`

is a programmers version of the function.`qF`

, shorthand for 'quick-factor' implements very fast factor generation from atomic vectors using either radix ordering`method = "radix"`

or hashing`method = "hash"`

. Factors can also be used for efficient grouped programming with*collapse*functions, especially if they are generated using`qF(x, na.exclude = FALSE)`

which assigns a level to missing values and attaches a class 'na.included' ensuring that no additional missing value checks are executed by*collapse*functions.`qG`

, shorthand for 'quick-group', generates a kind of factor-light without the levels attribute but instead an attribute providing the number of levels. Optionally the levels / groups can be attached, but without converting them to character. Objects have a class 'qG', which is also recognized in the*collapse*ecosystem.`fdroplevels`

is a substantially faster replacement for`droplevels`

.`finteraction`

is a fast alternative to`interaction`

implemented as a wrapper around`as_factor_GRP(GRP(...))`

. It can be used to generate a factor from multiple vectors, factors or a list of vectors / factors. Unused factor levels are always dropped.`groupid`

is a generalization of`data.table::rleid`

providing a run-length type group-id from atomic vectors. It is generalization as it also supports passing an ordering vector and skipping missing values. For example`qF`

and`qG`

with`method = "radix"`

are essentially implemented using`groupid(x, radixorder(x))`

.`seqid`

is a specialized function which creates a group-id from sequences of integer values. For any regular panel dataset`groupid(id, order(id, time))`

and`seqid(time, order(id, time))`

provide the same id variable.`seqid`

is especially useful for identifying discontinuities in time-sequences.`timeid`

is a specialized function to convert integer or double vectors representing time (such as 'Date', 'POSIXct' etc.) to factor or 'qG' object based on the greatest common divisor of elements (thus preserving gaps in time intervals).

## Table of Functions

Function / S3 Generic | Methods | Description | ||

`radixorder(v)` | No methods, for data frames and vectors | Radix-based ordering + grouping information | ||

`roworder(v)` | No methods, for data frames incl. pdata.frame | Row sorting/reordering | ||

`group` | No methods, for data frames and vectors | Hash-based grouping + grouping information | ||

`GRP` | `default, GRP, factor, qG, grouped_df, pseries, pdata.frame` | Fast grouping and a flexible grouping object | ||

`fgroup_by` | No methods, for data frames | Fast grouped data frame | ||

`fmatch` | No methods, for vectors and data frames | Fast matching | ||

`funique` , `fnunique` , `fduplicated` , `any_duplicated` | `default, data.frame, sf, pseries, pdata.frame, list` | Fast (number of) unique values/rows | ||

`fcount(v)` | Internal generic, supports vectors, matrices, data.frames, lists, grouped_df and pdata.frame | Fast group counts | ||

`qF` | No methods, for vectors | Quick factor generation | ||

`qG` | No methods, for vectors | Quick grouping of vectors and a 'factor-light' class | ||

`fdroplevels` | `factor, data.frame, list` | Fast removal of unused factor levels | ||

`finteraction` | No methods, for data frames and vectors | Fast interactions | ||

`groupid` | No methods, for vectors | Run-length type group-id | ||

`seqid` | No methods, for integer vectors | Run-length type integer sequence-id | ||

`timeid` | No methods, for integer or double vectors | Integer-id from time/date sequences |