fmedian is a generic function that computes the (column-wise) median value of all values in x, (optionally) grouped by g and/or weighted by w. The TRA argument can further be used to transform x using its (grouped, weighted) median value.

fmedian(x, ...)

# S3 method for default
fmedian(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
        use.g.names = TRUE, nthreads = 1L, ...)

# S3 method for matrix
fmedian(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
        use.g.names = TRUE, drop = TRUE, nthreads = 1L, ...)

# S3 method for data.frame
fmedian(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
        use.g.names = TRUE, drop = TRUE, nthreads = 1L, ...)

# S3 method for grouped_df
fmedian(x, w = NULL, TRA = NULL, na.rm = TRUE,
        use.g.names = FALSE, keep.group_vars = TRUE, keep.w = TRUE,
        nthreads = 1L, ...)

Arguments

x

a numeric vector, matrix, data frame or grouped data frame (class 'grouped_df').

g

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x.

w

a numeric vector of (non-negative) weights, may contain missing values, but only if x is also missing.

TRA

an integer or quoted operator indicating the transformation to perform: 0 - "replace_NA" | 1 - "replace_fill" | 2 - "replace" | 3 - "-" | 4 - "-+" | 5 - "/" | 6 - "%" | 7 - "+" | 8 - "*" | 9 - "%%" | 10 - "-%%". See TRA.

na.rm

logical. Skip missing values in x. Defaults to TRUE and implemented at very little computational cost. If na.rm = FALSE a NA is returned when encountered.

use.g.names

logical. Make group-names and add to the result as names (default method) or row-names (matrix and data frame methods). No row-names are generated for data.table's.

nthreads

integer. The number of threads to utilize. Parallelism is across groups for grouped computations and at the column-level otherwise. No parallelism is available for weighted computations.

drop

matrix and data.frame method: Logical. TRUE drops dimensions and returns an atomic vector if g = NULL and TRA = NULL.

keep.group_vars

grouped_df method: Logical. FALSE removes grouping variables after computation.

keep.w

grouped_df method: Logical. Retain summed weighting variable after computation (if contained in grouped_df).

...

arguments to be passed to or from other methods. If TRA is used, passing set = TRUE will transform data by reference and return the result invisibly.

Details

Median value estimation is done using std::nth_element in C++, which is an efficient partial sorting algorithm. A downside of this is that vectors need to be copied first and then partially sorted, thus fmedian currently requires additional memory equal to the size of the vector (x or a column of x).

Grouped computations are currently performed by mapping the data to a sparse-array and then partially sorting each row (group) of that array. Because of compiler optimizations this requires less memory than a full deep copy done with no groups.

The weighted median is defined as the element k from a set of sorted elements, such that the sum of weights of all elements larger and all elements smaller than k is <= sum(w)/2. If the half-sum of weights (sum(w)/2) is reached exactly for some element k, then (summing from the lower end) both k and k+1 would qualify as the weighted median (and some possible additional elements with zero weights following k would also qualify). fmedian solves these ties by taking a simple arithmetic mean of all elements qualifying as the weighted median.

The weighted median is computed using radixorder to first obtain an ordering of all elements, so it is considerably more computationally expensive than the unweighted version. With groups, the entire vector is also ordered, and the weighted median is computed in a single ordered pass through the data (after group-summing the weights, skipping weights for which x is missing).

If x is a matrix or data frame, these computations are performed independently for each column. When applied to data frames with groups or drop = FALSE, fmedian preserves all column attributes. The attributes of the data frame itself are also preserved.

Value

The (w weighted) median value of x, grouped by g, or (if TRA is used) x transformed by its (grouped, weighted) median.

Examples

## default vector method
mpg <- mtcars$mpg
fmedian(mpg)                         # Simple median value
#> [1] 19.2
fmedian(mpg, w = mtcars$hp)          # Weighted median: Weighted by hp
#> [1] 16.4
fmedian(mpg, TRA = "-")              # Simple transformation: Subtract median value
#>  [1]  1.8  1.8  3.6  2.2 -0.5 -1.1 -4.9  5.2  3.6  0.0 -1.4 -2.8 -1.9 -4.0 -8.8
#> [16] -8.8 -4.5 13.2 11.2 14.7  2.3 -3.7 -4.0 -5.9  0.0  8.1  6.8 11.2 -3.4  0.5
#> [31] -4.2  2.2
fmedian(mpg, mtcars$cyl)             # Grouped median value
#>    4    6    8 
#> 26.0 19.7 15.2 
fmedian(mpg, mtcars[c(2,8:9)])       # More groups..
#> 4.0.1 4.1.0 4.1.1 6.0.1 6.1.0 8.0.0 8.0.1 
#> 26.00 22.80 30.40 21.00 18.65 15.20 15.40 
g <- GRP(mtcars, ~ cyl + vs + am)    # Precomputing groups gives more speed !
fmedian(mpg, g)
#> 4.0.1 4.1.0 4.1.1 6.0.1 6.1.0 8.0.0 8.0.1 
#> 26.00 22.80 30.40 21.00 18.65 15.20 15.40 
fmedian(mpg, g, mtcars$hp)           # Grouped weighted median
#> 4.0.1 4.1.0 4.1.1 6.0.1 6.1.0 8.0.0 8.0.1 
#>  26.0  22.8  30.4  21.0  19.2  15.2  15.0 
fmedian(mpg, g, TRA = "-")           # Groupwise subtract median value
#>  [1]  0.00  0.00 -7.60  2.75  3.50 -0.55 -0.90  1.60  0.00  0.55 -0.85  1.20
#> [13]  2.10  0.00 -4.80 -4.80 -0.50  2.00  0.00  3.50 -1.30  0.30  0.00 -1.90
#> [25]  4.00 -3.10  0.00  0.00  0.40 -1.30 -0.40 -9.00
fmedian(mpg, g, mtcars$hp, "-")      # Groupwise subtract weighted median value
#>  [1]  0.0  0.0 -7.6  2.2  3.5 -1.1 -0.9  1.6  0.0  0.0 -1.4  1.2  2.1  0.0 -4.8
#> [16] -4.8 -0.5  2.0  0.0  3.5 -1.3  0.3  0.0 -1.9  4.0 -3.1  0.0  0.0  0.8 -1.3
#> [31]  0.0 -9.0

## data.frame method
fmedian(mtcars)
#>     mpg     cyl    disp      hp    drat      wt    qsec      vs      am    gear 
#>  19.200   6.000 196.300 123.000   3.695   3.325  17.710   0.000   0.000   4.000 
#>    carb 
#>   2.000 
head(fmedian(mtcars, TRA = "-"))
#>                    mpg cyl  disp  hp   drat     wt  qsec vs am gear carb
#> Mazda RX4          1.8   0 -36.3 -13  0.205 -0.705 -1.25  0  1    0    2
#> Mazda RX4 Wag      1.8   0 -36.3 -13  0.205 -0.450 -0.69  0  1    0    2
#> Datsun 710         3.6  -2 -88.3 -30  0.155 -1.005  0.90  1  1    0   -1
#> Hornet 4 Drive     2.2   0  61.7 -13 -0.615 -0.110  1.73  1  0   -1   -1
#> Hornet Sportabout -0.5   2 163.7  52 -0.545  0.115 -0.69  0  0   -1    0
#> Valiant           -1.1   0  28.7 -18 -0.935  0.135  2.51  1  0   -1   -1
fmedian(mtcars, g)
#>         mpg cyl  disp    hp  drat    wt  qsec vs am gear carb
#> 4.0.1 26.00   4 120.3  91.0 4.430 2.140 16.70  0  1  5.0  2.0
#> 4.1.0 22.80   4 140.8  95.0 3.700 3.150 20.01  1  0  4.0  2.0
#> 4.1.1 30.40   4  79.0  66.0 4.080 1.935 18.61  1  1  4.0  1.0
#> 6.0.1 21.00   6 160.0 110.0 3.900 2.770 16.46  0  1  4.0  4.0
#> 6.1.0 18.65   6 196.3 116.5 3.500 3.440 19.17  1  0  3.5  2.5
#> 8.0.0 15.20   8 355.0 180.0 3.075 3.810 17.35  0  0  3.0  3.0
#>  [ reached 'max' / getOption("max.print") -- omitted 1 rows ]
fmedian(fgroup_by(mtcars, cyl, vs, am))   # Another way of doing it..
#>   cyl vs am   mpg  disp    hp  drat    wt  qsec gear carb
#> 1   4  0  1 26.00 120.3  91.0 4.430 2.140 16.70  5.0  2.0
#> 2   4  1  0 22.80 140.8  95.0 3.700 3.150 20.01  4.0  2.0
#> 3   4  1  1 30.40  79.0  66.0 4.080 1.935 18.61  4.0  1.0
#> 4   6  0  1 21.00 160.0 110.0 3.900 2.770 16.46  4.0  4.0
#> 5   6  1  0 18.65 196.3 116.5 3.500 3.440 19.17  3.5  2.5
#> 6   8  0  0 15.20 355.0 180.0 3.075 3.810 17.35  3.0  3.0
#>  [ reached 'max' / getOption("max.print") -- omitted 1 rows ]
fmedian(mtcars, g, use.g.names = FALSE)   # No row-names generated
#>     mpg cyl  disp    hp  drat    wt  qsec vs am gear carb
#> 1 26.00   4 120.3  91.0 4.430 2.140 16.70  0  1  5.0  2.0
#> 2 22.80   4 140.8  95.0 3.700 3.150 20.01  1  0  4.0  2.0
#> 3 30.40   4  79.0  66.0 4.080 1.935 18.61  1  1  4.0  1.0
#> 4 21.00   6 160.0 110.0 3.900 2.770 16.46  0  1  4.0  4.0
#> 5 18.65   6 196.3 116.5 3.500 3.440 19.17  1  0  3.5  2.5
#> 6 15.20   8 355.0 180.0 3.075 3.810 17.35  0  0  3.0  3.0
#>  [ reached 'max' / getOption("max.print") -- omitted 1 rows ]

## matrix method
m <- qM(mtcars)
fmedian(m)
#>     mpg     cyl    disp      hp    drat      wt    qsec      vs      am    gear 
#>  19.200   6.000 196.300 123.000   3.695   3.325  17.710   0.000   0.000   4.000 
#>    carb 
#>   2.000 
head(fmedian(m, TRA = "-"))
#>                    mpg cyl  disp  hp   drat     wt  qsec vs am gear carb
#> Mazda RX4          1.8   0 -36.3 -13  0.205 -0.705 -1.25  0  1    0    2
#> Mazda RX4 Wag      1.8   0 -36.3 -13  0.205 -0.450 -0.69  0  1    0    2
#> Datsun 710         3.6  -2 -88.3 -30  0.155 -1.005  0.90  1  1    0   -1
#> Hornet 4 Drive     2.2   0  61.7 -13 -0.615 -0.110  1.73  1  0   -1   -1
#> Hornet Sportabout -0.5   2 163.7  52 -0.545  0.115 -0.69  0  0   -1    0
#> Valiant           -1.1   0  28.7 -18 -0.935  0.135  2.51  1  0   -1   -1
fmedian(m, g) # etc..
#>         mpg cyl  disp    hp  drat    wt  qsec vs am gear carb
#> 4.0.1 26.00   4 120.3  91.0 4.430 2.140 16.70  0  1  5.0  2.0
#> 4.1.0 22.80   4 140.8  95.0 3.700 3.150 20.01  1  0  4.0  2.0
#> 4.1.1 30.40   4  79.0  66.0 4.080 1.935 18.61  1  1  4.0  1.0
#> 6.0.1 21.00   6 160.0 110.0 3.900 2.770 16.46  0  1  4.0  4.0
#> 6.1.0 18.65   6 196.3 116.5 3.500 3.440 19.17  1  0  3.5  2.5
#> 8.0.0 15.20   8 355.0 180.0 3.075 3.810 17.35  0  0  3.0  3.0
#>  [ reached getOption("max.print") -- omitted 1 row ]
 
library(dplyr)
# grouped_df method
mtcars %>% group_by(cyl,vs,am) %>% fmedian()
#> # A tibble: 7 × 11
#>     cyl    vs    am   mpg  disp    hp  drat    wt  qsec  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     4     0     1  26    120.   91   4.43  2.14  16.7   5     2  
#> 2     4     1     0  22.8  141.   95   3.7   3.15  20.0   4     2  
#> 3     4     1     1  30.4   79    66   4.08  1.94  18.6   4     1  
#> 4     6     0     1  21    160   110   3.9   2.77  16.5   4     4  
#> 5     6     1     0  18.6  196.  116.  3.5   3.44  19.2   3.5   2.5
#> 6     8     0     0  15.2  355   180   3.08  3.81  17.4   3     3  
#> 7     8     0     1  15.4  326   300.  3.88  3.37  14.6   5     6  
mtcars %>% group_by(cyl,vs,am) %>% fmedian(hp)             # Weighted
#> # A tibble: 7 × 11
#>     cyl    vs    am sum.hp   mpg  disp  drat    wt  qsec  gear  carb
#>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     4     0     1     91  26   120.   4.43  2.14  16.7     5     2
#> 2     4     1     0    254  22.8 141.   3.7   3.15  20.0     4     2
#> 3     4     1     1    564  30.4  95.1  4.08  1.94  18.6     4     1
#> 4     6     0     1    395  21   160    3.9   2.77  16.5     4     4
#> 5     6     1     0    461  19.2 168.   3.92  3.44  18.9     4     4
#> 6     8     0     0   2330  15.2 360    3.08  3.84  17.4     3     3
#> 7     8     0     1    599  15   301    3.54  3.57  14.6     5     8
mtcars %>% fgroup_by(cyl,vs,am) %>% fmedian()              # Faster grouping!
#>   cyl vs am   mpg  disp    hp  drat    wt  qsec gear carb
#> 1   4  0  1 26.00 120.3  91.0 4.430 2.140 16.70  5.0  2.0
#> 2   4  1  0 22.80 140.8  95.0 3.700 3.150 20.01  4.0  2.0
#> 3   4  1  1 30.40  79.0  66.0 4.080 1.935 18.61  4.0  1.0
#> 4   6  0  1 21.00 160.0 110.0 3.900 2.770 16.46  4.0  4.0
#> 5   6  1  0 18.65 196.3 116.5 3.500 3.440 19.17  3.5  2.5
#> 6   8  0  0 15.20 355.0 180.0 3.075 3.810 17.35  3.0  3.0
#>  [ reached 'max' / getOption("max.print") -- omitted 1 rows ]
mtcars %>% fgroup_by(cyl,vs,am) %>% fmedian(TRA = "-")     # De-median
#>                   cyl vs am   mpg disp    hp   drat     wt  qsec gear carb
#> Mazda RX4           6  0  1  0.00  0.0   0.0  0.000 -0.150  0.00  0.0  0.0
#> Mazda RX4 Wag       6  0  1  0.00  0.0   0.0  0.000  0.105  0.56  0.0  0.0
#> Datsun 710          4  1  1 -7.60 29.0  27.0 -0.230  0.385  0.00  0.0  0.0
#> Hornet 4 Drive      6  1  0  2.75 61.7  -6.5 -0.420 -0.225  0.27 -0.5 -1.5
#> Hornet Sportabout   8  0  0  3.50  5.0  -5.0  0.075 -0.370 -0.33  0.0 -1.0
#> Valiant             6  1  0 -0.55 28.7 -11.5 -0.740  0.020  1.05 -0.5 -1.5
#>  [ reached 'max' / getOption("max.print") -- omitted 26 rows ]
#> 
#> Grouped by:  cyl, vs, am  [7 | 5 (3.8) 1-12] 
mtcars %>% fgroup_by(cyl,vs,am) %>% fselect(mpg, hp) %>%    # Faster selecting
      fmedian(hp, "-")  # Weighted de-median mpg, using hp as weights
#>                      hp  mpg
#> Mazda RX4           110  0.0
#> Mazda RX4 Wag       110  0.0
#> Datsun 710           93 -7.6
#> Hornet 4 Drive      110  2.2
#> Hornet Sportabout   175  3.5
#> Valiant             105 -1.1
#> Duster 360          245 -0.9
#> Merc 240D            62  1.6
#> Merc 230             95  0.0
#> Merc 280            123  0.0
#> Merc 280C           123 -1.4
#> Merc 450SE          180  1.2
#> Merc 450SL          180  2.1
#> Merc 450SLC         180  0.0
#> Cadillac Fleetwood  205 -4.8
#> Lincoln Continental 215 -4.8
#> Chrysler Imperial   230 -0.5
#> Fiat 128             66  2.0
#> Honda Civic          52  0.0
#> Toyota Corolla       65  3.5
#> Toyota Corona        97 -1.3
#> Dodge Challenger    150  0.3
#> AMC Javelin         150  0.0
#> Camaro Z28          245 -1.9
#> Pontiac Firebird    175  4.0
#> Fiat X1-9            66 -3.1
#> Porsche 914-2        91  0.0
#> Lotus Europa        113  0.0
#> Ford Pantera L      264  0.8
#> Ferrari Dino        175 -1.3
#> Maserati Bora       335  0.0
#> Volvo 142E          109 -9.0
#> 
#> Grouped by:  cyl, vs, am  [7 | 5 (3.8) 1-12]