fmedian is a generic function that computes the (column-wise) median value of all values in x, (optionally) grouped by g and/or weighted by w. The TRA argument can further be used to transform x using its (grouped, weighted) median value.

fmedian(x, ...)

# S3 method for default
fmedian(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
        use.g.names = TRUE, ...)

# S3 method for matrix
fmedian(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
        use.g.names = TRUE, drop = TRUE, ...)

# S3 method for data.frame
fmedian(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
        use.g.names = TRUE, drop = TRUE, ...)

# S3 method for grouped_df
fmedian(x, w = NULL, TRA = NULL, na.rm = TRUE,
        use.g.names = FALSE, keep.group_vars = TRUE, keep.w = TRUE, ...)

Arguments

x

a numeric vector, matrix, data frame or grouped data frame (class 'grouped_df').

g

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x.

w

a numeric vector of (non-negative) weights, may contain missing values, but only if x is also missing.

TRA

an integer or quoted operator indicating the transformation to perform: 1 - "replace_fill" | 2 - "replace" | 3 - "-" | 4 - "-+" | 5 - "/" | 6 - "%" | 7 - "+" | 8 - "*" | 9 - "%%" | 10 - "-%%". See TRA.

na.rm

logical. Skip missing values in x. Defaults to TRUE and implemented at very little computational cost. If na.rm = FALSE a NA is returned when encountered.

use.g.names

logical. Make group-names and add to the result as names (default method) or row-names (matrix and data frame methods). No row-names are generated for data.table's.

drop

matrix and data.frame method: Logical. TRUE drops dimensions and returns an atomic vector if g = NULL and TRA = NULL.

keep.group_vars

grouped_df method: Logical. FALSE removes grouping variables after computation.

keep.w

grouped_df method: Logical. Retain summed weighting variable after computation (if contained in grouped_df).

...

arguments to be passed to or from other methods.

Details

Median value estimation is done using std::nth_element in C++, which is an efficient partial sorting algorithm. A downside of this is that vectors need to be copied first and then partially sorted, thus fmedian currently requires additional memory equal to the size of the vector (x or a column of x).

Grouped computations are currently performed by mapping the data to a sparse-array and then partially sorting each row (group) of that array. Because of compiler optimizations this requires less memory than a full deep copy done with no groups.

The weighted median is defined as the element k from a set of sorted elements, such that the sum of weights of all elements larger and all elements smaller than k is <= sum(w)/2. If the half-sum of weights (sum(w)/2) is reached exactly for some element k, then (summing from the lower end) both k and k+1 would qualify as the weighted median (and some possible additional elements with zero weights following k would also qualify). fmedian solves these ties by taking a simple arithmetic mean of all elements qualifying as the weighted median.

The weighted median is computed using radixorder to first obtain an ordering of all elements, so it is considerably more computationally expensive than the unweighted version. With groups, the entire vector is also ordered, and the weighted median is computed in a single ordered pass through the data (after group-summing the weights, skipping weights for which x is missing).

If x is a matrix or data frame, these computations are performed independently for each column. When applied to data frames with groups or drop = FALSE, fmedian preserves all column attributes (such as variable labels) but does not distinguish between classed and unclassed objects. The attributes of the data frame itself are also preserved.

Value

The (w weighted) median value of x, grouped by g, or (if TRA is used) x transformed by its median, grouped by g.

See also

Examples

## default vector method mpg <- mtcars$mpg fmedian(mpg) # Simple median value
#> [1] 19.2
fmedian(mpg, w = mtcars$hp) # Weighted median: Weighted by hp
#> [1] 16.4
fmedian(mpg, TRA = "-") # Simple transformation: Subtract median value
#> [1] 1.8 1.8 3.6 2.2 -0.5 -1.1 -4.9 5.2 3.6 0.0 -1.4 -2.8 -1.9 -4.0 -8.8 #> [16] -8.8 -4.5 13.2 11.2 14.7 2.3 -3.7 -4.0 -5.9 0.0 8.1 6.8 11.2 -3.4 0.5 #> [31] -4.2 2.2
fmedian(mpg, mtcars$cyl) # Grouped median value
#> 4 6 8 #> 26.0 19.7 15.2
fmedian(mpg, mtcars[c(2,8:9)]) # More groups..
#> 4.0.1 4.1.0 4.1.1 6.0.1 6.1.0 8.0.0 8.0.1 #> 26.00 22.80 30.40 21.00 18.65 15.20 15.40
g <- GRP(mtcars, ~ cyl + vs + am) # Precomputing groups gives more speed ! fmedian(mpg, g)
#> 4.0.1 4.1.0 4.1.1 6.0.1 6.1.0 8.0.0 8.0.1 #> 26.00 22.80 30.40 21.00 18.65 15.20 15.40
fmedian(mpg, g, mtcars$hp) # Grouped weighted median
#> 4.0.1 4.1.0 4.1.1 6.0.1 6.1.0 8.0.0 8.0.1 #> 26.0 22.8 30.4 21.0 19.2 15.2 15.0
fmedian(mpg, g, TRA = "-") # Groupwise subtract median value
#> [1] 0.00 0.00 -7.60 2.75 3.50 -0.55 -0.90 1.60 0.00 0.55 -0.85 1.20 #> [13] 2.10 0.00 -4.80 -4.80 -0.50 2.00 0.00 3.50 -1.30 0.30 0.00 -1.90 #> [25] 4.00 -3.10 0.00 0.00 0.40 -1.30 -0.40 -9.00
fmedian(mpg, g, mtcars$hp, "-") # Groupwise subtract weighted median value
#> [1] 0.0 0.0 -7.6 2.2 3.5 -1.1 -0.9 1.6 0.0 0.0 -1.4 1.2 2.1 0.0 -4.8 #> [16] -4.8 -0.5 2.0 0.0 3.5 -1.3 0.3 0.0 -1.9 4.0 -3.1 0.0 0.0 0.8 -1.3 #> [31] 0.0 -9.0
## data.frame method fmedian(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear #> 19.200 6.000 196.300 123.000 3.695 3.325 17.710 0.000 0.000 4.000 #> carb #> 2.000
head(fmedian(mtcars, TRA = "-"))
#> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 1.8 0 -36.3 -13 0.205 -0.705 -1.25 0 1 0 2 #> Mazda RX4 Wag 1.8 0 -36.3 -13 0.205 -0.450 -0.69 0 1 0 2 #> Datsun 710 3.6 -2 -88.3 -30 0.155 -1.005 0.90 1 1 0 -1 #> Hornet 4 Drive 2.2 0 61.7 -13 -0.615 -0.110 1.73 1 0 -1 -1 #> Hornet Sportabout -0.5 2 163.7 52 -0.545 0.115 -0.69 0 0 -1 0 #> Valiant -1.1 0 28.7 -18 -0.935 0.135 2.51 1 0 -1 -1
fmedian(mtcars, g)
#> mpg cyl disp hp drat wt qsec vs am gear carb #> 4.0.1 26.00 4 120.3 91.0 4.430 2.140 16.70 0 1 5.0 2.0 #> 4.1.0 22.80 4 140.8 95.0 3.700 3.150 20.01 1 0 4.0 2.0 #> 4.1.1 30.40 4 79.0 66.0 4.080 1.935 18.61 1 1 4.0 1.0 #> 6.0.1 21.00 6 160.0 110.0 3.900 2.770 16.46 0 1 4.0 4.0 #> 6.1.0 18.65 6 196.3 116.5 3.500 3.440 19.17 1 0 3.5 2.5 #> 8.0.0 15.20 8 355.0 180.0 3.075 3.810 17.35 0 0 3.0 3.0 #> 8.0.1 15.40 8 326.0 299.5 3.880 3.370 14.55 0 1 5.0 6.0
fmedian(fgroup_by(mtcars, cyl, vs, am)) # Another way of doing it..
#> cyl vs am mpg disp hp drat wt qsec gear carb #> 1 4 0 1 26.00 120.3 91.0 4.430 2.140 16.70 5.0 2.0 #> 2 4 1 0 22.80 140.8 95.0 3.700 3.150 20.01 4.0 2.0 #> 3 4 1 1 30.40 79.0 66.0 4.080 1.935 18.61 4.0 1.0 #> 4 6 0 1 21.00 160.0 110.0 3.900 2.770 16.46 4.0 4.0 #> 5 6 1 0 18.65 196.3 116.5 3.500 3.440 19.17 3.5 2.5 #> 6 8 0 0 15.20 355.0 180.0 3.075 3.810 17.35 3.0 3.0 #> 7 8 0 1 15.40 326.0 299.5 3.880 3.370 14.55 5.0 6.0
fmedian(mtcars, g, use.g.names = FALSE) # No row-names generated
#> mpg cyl disp hp drat wt qsec vs am gear carb #> 1 26.00 4 120.3 91.0 4.430 2.140 16.70 0 1 5.0 2.0 #> 2 22.80 4 140.8 95.0 3.700 3.150 20.01 1 0 4.0 2.0 #> 3 30.40 4 79.0 66.0 4.080 1.935 18.61 1 1 4.0 1.0 #> 4 21.00 6 160.0 110.0 3.900 2.770 16.46 0 1 4.0 4.0 #> 5 18.65 6 196.3 116.5 3.500 3.440 19.17 1 0 3.5 2.5 #> 6 15.20 8 355.0 180.0 3.075 3.810 17.35 0 0 3.0 3.0 #> 7 15.40 8 326.0 299.5 3.880 3.370 14.55 0 1 5.0 6.0
## matrix method m <- qM(mtcars) fmedian(m)
#> mpg cyl disp hp drat wt qsec vs am gear #> 19.200 6.000 196.300 123.000 3.695 3.325 17.710 0.000 0.000 4.000 #> carb #> 2.000
head(fmedian(m, TRA = "-"))
#> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 1.8 0 -36.3 -13 0.205 -0.705 -1.25 0 1 0 2 #> Mazda RX4 Wag 1.8 0 -36.3 -13 0.205 -0.450 -0.69 0 1 0 2 #> Datsun 710 3.6 -2 -88.3 -30 0.155 -1.005 0.90 1 1 0 -1 #> Hornet 4 Drive 2.2 0 61.7 -13 -0.615 -0.110 1.73 1 0 -1 -1 #> Hornet Sportabout -0.5 2 163.7 52 -0.545 0.115 -0.69 0 0 -1 0 #> Valiant -1.1 0 28.7 -18 -0.935 0.135 2.51 1 0 -1 -1
fmedian(m, g) # etc..
#> mpg cyl disp hp drat wt qsec vs am gear carb #> 4.0.1 26.00 4 120.3 91.0 4.430 2.140 16.70 0 1 5.0 2.0 #> 4.1.0 22.80 4 140.8 95.0 3.700 3.150 20.01 1 0 4.0 2.0 #> 4.1.1 30.40 4 79.0 66.0 4.080 1.935 18.61 1 1 4.0 1.0 #> 6.0.1 21.00 6 160.0 110.0 3.900 2.770 16.46 0 1 4.0 4.0 #> 6.1.0 18.65 6 196.3 116.5 3.500 3.440 19.17 1 0 3.5 2.5 #> 8.0.0 15.20 8 355.0 180.0 3.075 3.810 17.35 0 0 3.0 3.0 #> 8.0.1 15.40 8 326.0 299.5 3.880 3.370 14.55 0 1 5.0 6.0
library(dplyr) # grouped_df method mtcars %>% group_by(cyl,vs,am) %>% fmedian
#> # A tibble: 7 x 11 #> cyl vs am mpg disp hp drat wt qsec gear carb #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 4 0 1 26 120. 91 4.43 2.14 16.7 5 2 #> 2 4 1 0 22.8 141. 95 3.7 3.15 20.0 4 2 #> 3 4 1 1 30.4 79 66 4.08 1.94 18.6 4 1 #> 4 6 0 1 21 160 110 3.9 2.77 16.5 4 4 #> 5 6 1 0 18.6 196. 116. 3.5 3.44 19.2 3.5 2.5 #> 6 8 0 0 15.2 355 180 3.08 3.81 17.4 3 3 #> 7 8 0 1 15.4 326 300. 3.88 3.37 14.6 5 6
mtcars %>% group_by(cyl,vs,am) %>% fmedian(hp) # Weighted
#> # A tibble: 7 x 11 #> cyl vs am sum.hp mpg disp drat wt qsec gear carb #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 4 0 1 91 26 120. 4.43 2.14 16.7 5 2 #> 2 4 1 0 254 22.8 141. 3.7 3.15 20.0 4 2 #> 3 4 1 1 564 30.4 95.1 4.08 1.94 18.6 4 1 #> 4 6 0 1 395 21 160 3.9 2.77 16.5 4 4 #> 5 6 1 0 461 19.2 168. 3.92 3.44 18.9 4 4 #> 6 8 0 0 2330 15.2 360 3.08 3.84 17.4 3 3 #> 7 8 0 1 599 15 301 3.54 3.57 14.6 5 8
mtcars %>% fgroup_by(cyl,vs,am) %>% fmedian # Faster grouping!
#> cyl vs am mpg disp hp drat wt qsec gear carb #> 1 4 0 1 26.00 120.3 91.0 4.430 2.140 16.70 5.0 2.0 #> 2 4 1 0 22.80 140.8 95.0 3.700 3.150 20.01 4.0 2.0 #> 3 4 1 1 30.40 79.0 66.0 4.080 1.935 18.61 4.0 1.0 #> 4 6 0 1 21.00 160.0 110.0 3.900 2.770 16.46 4.0 4.0 #> 5 6 1 0 18.65 196.3 116.5 3.500 3.440 19.17 3.5 2.5 #> 6 8 0 0 15.20 355.0 180.0 3.075 3.810 17.35 3.0 3.0 #> 7 8 0 1 15.40 326.0 299.5 3.880 3.370 14.55 5.0 6.0
mtcars %>% fgroup_by(cyl,vs,am) %>% fmedian(TRA = "-") # De-median
#> cyl vs am mpg disp hp drat wt qsec gear carb #> Mazda RX4 6 0 1 0.00 0.0 0.0 0.000 -0.150 0.00 0.0 0.0 #> Mazda RX4 Wag 6 0 1 0.00 0.0 0.0 0.000 0.105 0.56 0.0 0.0 #> Datsun 710 4 1 1 -7.60 29.0 27.0 -0.230 0.385 0.00 0.0 0.0 #> Hornet 4 Drive 6 1 0 2.75 61.7 -6.5 -0.420 -0.225 0.27 -0.5 -1.5 #> Hornet Sportabout 8 0 0 3.50 5.0 -5.0 0.075 -0.370 -0.33 0.0 -1.0 #> Valiant 6 1 0 -0.55 28.7 -11.5 -0.740 0.020 1.05 -0.5 -1.5 #> Duster 360 8 0 0 -0.90 5.0 65.0 0.135 -0.240 -1.51 0.0 1.0 #> Merc 240D 4 1 0 1.60 5.9 -33.0 -0.010 0.040 -0.01 0.0 0.0 #> Merc 230 4 1 0 0.00 0.0 0.0 0.220 0.000 2.89 0.0 0.0 #> Merc 280 6 1 0 0.55 -28.7 6.5 0.420 0.000 -0.87 0.5 1.5 #> Merc 280C 6 1 0 -0.85 -28.7 6.5 0.420 0.000 -0.27 0.5 1.5 #> Merc 450SE 8 0 0 1.20 -79.2 0.0 -0.005 0.260 0.05 0.0 0.0 #> Merc 450SL 8 0 0 2.10 -79.2 0.0 -0.005 -0.080 0.25 0.0 0.0 #> Merc 450SLC 8 0 0 0.00 -79.2 0.0 -0.005 -0.030 0.65 0.0 0.0 #> Cadillac Fleetwood 8 0 0 -4.80 117.0 25.0 -0.145 1.440 0.63 0.0 1.0 #> Lincoln Continental 8 0 0 -4.80 105.0 35.0 -0.075 1.614 0.47 0.0 1.0 #> Chrysler Imperial 8 0 0 -0.50 85.0 50.0 0.155 1.535 0.07 0.0 1.0 #> Fiat 128 4 1 1 2.00 -0.3 0.0 0.000 0.265 0.86 0.0 0.0 #> Honda Civic 4 1 1 0.00 -3.3 -14.0 0.850 -0.320 -0.09 0.0 1.0 #> Toyota Corolla 4 1 1 3.50 -7.9 -1.0 0.140 -0.100 1.29 0.0 0.0 #> Toyota Corona 4 1 0 -1.30 -20.7 2.0 0.000 -0.685 0.00 -1.0 -1.0 #> Dodge Challenger 8 0 0 0.30 -37.0 -30.0 -0.315 -0.290 -0.48 0.0 -1.0 #> AMC Javelin 8 0 0 0.00 -51.0 -30.0 0.075 -0.375 -0.05 0.0 -1.0 #> Camaro Z28 8 0 0 -1.90 -5.0 65.0 0.655 0.030 -1.94 0.0 1.0 #> Pontiac Firebird 8 0 0 4.00 45.0 -5.0 0.005 0.035 -0.30 0.0 -1.0 #> Fiat X1-9 4 1 1 -3.10 0.0 0.0 0.000 0.000 0.29 0.0 0.0 #> Porsche 914-2 4 0 1 0.00 0.0 0.0 0.000 0.000 0.00 0.0 0.0 #> Lotus Europa 4 1 1 0.00 16.1 47.0 -0.310 -0.422 -1.71 1.0 1.0 #> Ford Pantera L 8 0 1 0.40 25.0 -35.5 0.340 -0.200 -0.05 0.0 -2.0 #> Ferrari Dino 6 0 1 -1.30 -15.0 65.0 -0.280 0.000 -0.96 1.0 2.0 #> Maserati Bora 8 0 1 -0.40 -25.0 35.5 -0.340 0.200 0.05 0.0 2.0 #> Volvo 142E 4 1 1 -9.00 42.0 43.0 0.030 0.845 -0.01 0.0 1.0 #> #> Grouped by: cyl, vs, am [7 | 5 (3.8)]
mtcars %>% fgroup_by(cyl,vs,am) %>% fselect(mpg, hp) %>% # Faster selecting fmedian(hp, "-") # Weighted de-median mpg, using hp as weights
#> hp mpg #> Mazda RX4 110 0.0 #> Mazda RX4 Wag 110 0.0 #> Datsun 710 93 -7.6 #> Hornet 4 Drive 110 2.2 #> Hornet Sportabout 175 3.5 #> Valiant 105 -1.1 #> Duster 360 245 -0.9 #> Merc 240D 62 1.6 #> Merc 230 95 0.0 #> Merc 280 123 0.0 #> Merc 280C 123 -1.4 #> Merc 450SE 180 1.2 #> Merc 450SL 180 2.1 #> Merc 450SLC 180 0.0 #> Cadillac Fleetwood 205 -4.8 #> Lincoln Continental 215 -4.8 #> Chrysler Imperial 230 -0.5 #> Fiat 128 66 2.0 #> Honda Civic 52 0.0 #> Toyota Corolla 65 3.5 #> Toyota Corona 97 -1.3 #> Dodge Challenger 150 0.3 #> AMC Javelin 150 0.0 #> Camaro Z28 245 -1.9 #> Pontiac Firebird 175 4.0 #> Fiat X1-9 66 -3.1 #> Porsche 914-2 91 0.0 #> Lotus Europa 113 0.0 #> Ford Pantera L 264 0.8 #> Ferrari Dino 175 -1.3 #> Maserati Bora 335 0.0 #> Volvo 142E 109 -9.0 #> #> Grouped by: cyl, vs, am [7 | 5 (3.8)]