fsummarize is a much faster version of dplyr::summarise, when used together with the Fast Statistical Functions.

fsummarise(.data, ..., keep.group_vars = TRUE)
smr(.data, ..., keep.group_vars = TRUE)        # Shorthand

Arguments

.data

a (grouped) data frame or named list of columns. Grouped data can be created with fgroup_by or dplyr::group_by.

...

name-value pairs of summary functions, or across statements. For fast performance use the Fast Statistical Functions.

keep.group_vars

logical. FALSE removes grouping variables after computation.

Value

If .data is grouped by fgroup_by or dplyr::group_by, the result is a data frame of the same class and attributes with rows reduced to the number of groups. If .data is not grouped, the result is a data frame of the same class and attributes with 1 row.

Note

Since v1.7, fsummarise is fully featured, allowing expressions using functions and columns of the data as well as external scalar values (just like dplyr::summarise). NOTE however that once a Fast Statistical Function is used, the execution will be vectorized instead of split-apply-combine computing over groups. Please see the first Example.

Examples

 
library(magrittr) # Note: Used because |> is not available on older R versions
## Since v1.7, fsummarise supports arbitrary expressions, and expressions
## containing fast statistical functions receive vectorized execution:

# (a) This is an expression using base R functions which is executed by groups
mtcars %>% fgroup_by(cyl) %>% fsummarise(res = mean(mpg) + min(qsec))
#>   cyl      res
#> 1   4 43.36364
#> 2   6 35.24286
#> 3   8 29.60000

# (b) Here, the use of fmean causes the whole expression to be executed
# in a vectorized way i.e. the expression is translated to something like
# fmean(mpg, g = cyl) + min(mpg) and executed, thus the result is different
# from (a), because the minimum is calculated over the entire sample
mtcars %>% fgroup_by(cyl) %>% fsummarise(mpg = fmean(mpg) + min(qsec))
#>   cyl      mpg
#> 1   4 41.16364
#> 2   6 34.24286
#> 3   8 29.60000

# (c) For fully vectorized execution, use fmin. This yields the same as (a)
mtcars %>% fgroup_by(cyl) %>% fsummarise(mpg = fmean(mpg) + fmin(qsec))
#>   cyl      mpg
#> 1   4 43.36364
#> 2   6 35.24286
#> 3   8 29.60000

# In across() statements it is fine to mix different functions, each will
# be executed on its own terms (i.e. vectorized for fmean and standard for sum)
mtcars %>% fgroup_by(cyl) %>% fsummarise(across(mpg:hp, list(fmean, sum)))
#>   cyl mpg_fmean mpg_sum cyl_fmean cyl_sum disp_fmean disp_sum  hp_fmean hp_sum
#> 1   4  26.66364   293.3         4      44   105.1364   1156.5  82.63636    909
#> 2   6  19.74286   138.2         6      42   183.3143   1283.2 122.28571    856
#> 3   8  15.10000   211.4         8     112   353.1000   4943.4 209.21429   2929

# Note that this still detects fmean as a fast function, the names of the list
# are irrelevant, but the function name must be typed or passed as a character vector,
# Otherwise functions will be executed by groups e.g. function(x) fmean(x) won't vectorize
mtcars %>% fgroup_by(cyl) %>% fsummarise(across(mpg:hp, list(mu = fmean, sum = sum)))
#>   cyl   mpg_mu mpg_sum cyl_mu cyl_sum  disp_mu disp_sum     hp_mu hp_sum
#> 1   4 26.66364   293.3      4      44 105.1364   1156.5  82.63636    909
#> 2   6 19.74286   138.2      6      42 183.3143   1283.2 122.28571    856
#> 3   8 15.10000   211.4      8     112 353.1000   4943.4 209.21429   2929

# We can force none-vectorized execution by setting .apply = TRUE
mtcars %>% fgroup_by(cyl) %>% fsummarise(across(mpg:hp, list(mu = fmean, sum = sum), .apply = TRUE))
#>   cyl   mpg_mu mpg_sum cyl_mu cyl_sum  disp_mu disp_sum     hp_mu hp_sum
#> 1   4 26.66364   293.3      4      44 105.1364   1156.5  82.63636    909
#> 2   6 19.74286   138.2      6      42 183.3143   1283.2 122.28571    856
#> 3   8 15.10000   211.4      8     112 353.1000   4943.4 209.21429   2929

# Another argument of across(): Order the result first by function, then by column
mtcars %>% fgroup_by(cyl) %>%
     fsummarise(across(mpg:hp, list(mu = fmean, sum = sum), .transpose = FALSE))
#>   cyl   mpg_mu cyl_mu  disp_mu     hp_mu mpg_sum cyl_sum disp_sum hp_sum
#> 1   4 26.66364      4 105.1364  82.63636   293.3      44   1156.5    909
#> 2   6 19.74286      6 183.3143 122.28571   138.2      42   1283.2    856
#> 3   8 15.10000      8 353.1000 209.21429   211.4     112   4943.4   2929

#----------------------------------------------------------------------------
# Examples that also work for pre 1.7 versions

# Simple use
fsummarise(mtcars, mean_mpg = fmean(mpg),
                   sd_mpg = fsd(mpg))
#>   mean_mpg   sd_mpg
#> 1 20.09062 6.026948

# Using base functions (not a big difference without groups)
fsummarise(mtcars, mean_mpg = mean(mpg),
                   sd_mpg = sd(mpg))
#>   mean_mpg   sd_mpg
#> 1 20.09062 6.026948

# Grouped use
mtcars %>% fgroup_by(cyl) %>%
  fsummarise(mean_mpg = fmean(mpg),
             sd_mpg = fsd(mpg))
#>   cyl mean_mpg   sd_mpg
#> 1   4 26.66364 4.509828
#> 2   6 19.74286 1.453567
#> 3   8 15.10000 2.560048

# This is still efficient but quite a bit slower on large data (many groups)
mtcars %>% fgroup_by(cyl) %>%
  fsummarise(mean_mpg = mean(mpg),
             sd_mpg = sd(mpg))
#>   cyl mean_mpg   sd_mpg
#> 1   4 26.66364 4.509828
#> 2   6 19.74286 1.453567
#> 3   8 15.10000 2.560048

# Weighted aggregation
mtcars %>% fgroup_by(cyl) %>%
  fsummarise(w_mean_mpg = fmean(mpg, wt),
             w_sd_mpg = fsd(mpg, wt))
#>   cyl w_mean_mpg w_sd_mpg
#> 1   4   25.93504 4.275234
#> 2   6   19.64578 1.397297
#> 3   8   14.80643 2.638850


## Can also group with dplyr::group_by, but at a conversion cost, see ?GRP
library(dplyr)
mtcars %>% group_by(cyl) %>%
  fsummarise(mean_mpg = fmean(mpg),
             sd_mpg = fsd(mpg))
#> # A tibble: 3 × 3
#>     cyl mean_mpg sd_mpg
#>   <dbl>    <dbl>  <dbl>
#> 1     4     26.7   4.51
#> 2     6     19.7   1.45
#> 3     8     15.1   2.56

# Again less efficient...
mtcars %>% group_by(cyl) %>%
  fsummarise(mean_mpg = mean(mpg),
             sd_mpg = sd(mpg))
#> # A tibble: 3 × 3
#>     cyl mean_mpg sd_mpg
#>   <dbl>    <dbl>  <dbl>
#> 1     4     26.7   4.51
#> 2     6     19.7   1.45
#> 3     8     15.1   2.56