fnth (column-wise) returns the n'th smallest element from a set of unsorted elements x corresponding to an integer index (n), or to a probability between 0 and 1. If n is passed as a probability, ties can be resolved using the lower, upper, or (default) average of the possible elements. These are discontinuous and fast methods to estimate a sample quantile.

fnth(x, n = 0.5, ...)

# S3 method for default
fnth(x, n = 0.5, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
     use.g.names = TRUE, ties = "mean", ...)

# S3 method for matrix
fnth(x, n = 0.5, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
     use.g.names = TRUE, drop = TRUE, ties = "mean", ...)

# S3 method for data.frame
fnth(x, n = 0.5, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
     use.g.names = TRUE, drop = TRUE, ties = "mean", ...)

# S3 method for grouped_df
fnth(x, n = 0.5, w = NULL, TRA = NULL, na.rm = TRUE,
     use.g.names = FALSE, keep.group_vars = TRUE, keep.w = TRUE,
     ties = "mean", ...)

Arguments

x

a numeric vector, matrix, data frame or grouped data frame (class 'grouped_df').

n

the element to return using a single integer index such that 1 < n < NROW(x), or a probability 0 < n < 1. See Details.

g

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x.

w

a numeric vector of (non-negative) weights, may contain missing values.

TRA

an integer or quoted operator indicating the transformation to perform: 1 - "replace_fill" | 2 - "replace" | 3 - "-" | 4 - "-+" | 5 - "/" | 6 - "%" | 7 - "+" | 8 - "*" | 9 - "%%" | 10 - "-%%". See TRA.

na.rm

logical. Skip missing values in x. Defaults to TRUE and implemented at very little computational cost. If na.rm = FALSE a NA is returned when encountered.

use.g.names

logical. Make group-names and add to the result as names (default method) or row-names (matrix and data frame methods). No row-names are generated for data.table's.

ties

an integer or character string specifying the method to resolve ties between adjacent qualifying elements:

Int. String Description
1"mean"take the arithmetic mean of all qualifying elements.
2"min"take the smallest of the elements.
3"max"take the largest of the elements.

drop

matrix and data.frame method: Logical. TRUE drops dimensions and returns an atomic vector if g = NULL and TRA = NULL.

keep.group_vars

grouped_df method: Logical. FALSE removes grouping variables after computation.

keep.w

grouped_df method: Logical. Retain sum of weighting variable after computation (if contained in grouped_df).

...

arguments to be passed to or from other methods.

Details

This is an R port to std::nth_element, an efficient partial sorting algorithm in C++. It is also used to calculated the median (in fact the default fnth(x, n = 0.5) is identical to fmedian(x), so see also the details for fmedian).

fnth generalizes the principles of median value calculation to find arbitrary elements. It offers considerable flexibility by providing both simple order statistics and simple discontinuous quantile estimation. Regarding the former, setting n to an index between 1 and NROW(x) will return the n'th smallest element of x, about 2x faster than sort(x, partial = n)[n]. As to the latter, setting n to a probability between 0 and 1 will return the corresponding element of x, and resolve ties between multiple qualifying elements (such as when n = 0.5 and x is even) using the arithmetic average ties = "mean", or the smallest ties = "min" or largest ties = "max" of those elements.

If n > 1 is used and x contains missing values (and na.rm = TRUE, otherwise NA is returned), n is internally converted to a probability using p = (n-1)/(NROW(x)-1), and that probability is applied to the set of complete elements (of each column if x is a matrix or data frame) to find the as.integer(p*(fNobs(x)-1))+1L'th element (which corresponds to option ties = "min"). Note that it is necessary to subtract and add 1 so that n = 1 corresponds to p = 0 and n = NROW(x) to p = 1.

When using grouped computations (supplying a vector or list to g subdividing x) and n > 1 is used, it is transformed to a probability p = (n-1)/(NROW(x)/ng-1) (where ng contains the number of unique groups in g) and ties = "lower" is used to sort out clashes. This could be useful for example to return the n'th smallest element of each group in a balanced panel, but with unequal group sizes it more intuitive to pass a probability to n.

If weights are used, the same principles apply as for weighted median calculation: A target partial sum of weights p*sum(w) is calculated, and the weighted n'th element is the element k such that all elements smaller than k have a sum of weights <= p*sum(w), and all elements larger than k have a sum of weights <= (1 - p)*sum(w). If the partial-sum of weights (p*sum(w)) is reached exactly for some element k, then (summing from the lower end) both k and k+1 would qualify as the weighted n'th element (and some possible additional elements with zero weights following k would also qualify). If n > 1, the lowest of those elements is chosen (congruent with the unweighted behavior), but if 0 < n < 1, the ties option regulates how to resolve such conflicts, yielding lower-weighted, upper-weighted or (default) average weighted n'th elements.

The weighted n'th element is computed using radixorder to first obtain an ordering of all elements, so it is considerably more computationally expensive than the unweighted version. With groups, the entire vector is also ordered, and the weighted n'th element is computed in a single ordered pass through the data (after calculating partial-group sums of the weights, skipping weights for which x is missing).

If x is a matrix or data frame, these computations are performed independently for each column. Column-attributes and overall attributes of a data frame are preserved (if g is used or drop = FALSE).

Value

The (w weighted) n'th element of x, grouped by g, or (if TRA is used) x transformed by its n'th element, grouped by g.

See also

Examples

## default vector method mpg <- mtcars$mpg fnth(mpg) # Simple nth element: Median (same as fmedian(mpg))
#> [1] 19.2
fnth(mpg, 5) # 5th smallest element
#> [1] 14.7
sort(mpg, partial = 5)[5] # Same using base R, fnth is 2x faster.
#> [1] 14.7
fnth(mpg, 0.75) # Third quartile
#> [1] 22.8
fnth(mpg, 0.75, w = mtcars$hp) # Weighted third quartile: Weighted by hp
#> [1] 21
fnth(mpg, 0.75, TRA = "-") # Simple transformation: Subtract third quartile
#> [1] -1.8 -1.8 0.0 -1.4 -4.1 -4.7 -8.5 1.6 0.0 -3.6 -5.0 -6.4 #> [13] -5.5 -7.6 -12.4 -12.4 -8.1 9.6 7.6 11.1 -1.3 -7.3 -7.6 -9.5 #> [25] -3.6 4.5 3.2 7.6 -7.0 -3.1 -7.8 -1.4
fnth(mpg, 0.75, mtcars$cyl) # Grouped third quartile
#> 4 6 8 #> 30.4 21.0 16.1
fnth(mpg, 0.75, mtcars[c(2,8:9)]) # More groups..
#> 4.0.1 4.1.0 4.1.1 6.0.1 6.1.0 8.0.0 8.0.1 #> 26.00 22.80 30.40 21.00 20.30 16.85 15.40
g <- GRP(mtcars, ~ cyl + vs + am) # Precomputing groups gives more speed ! fnth(mpg, 0.75, g)
#> 4.0.1 4.1.0 4.1.1 6.0.1 6.1.0 8.0.0 8.0.1 #> 26.00 22.80 30.40 21.00 20.30 16.85 15.40
fnth(mpg, 0.75, g, mtcars$hp) # Grouped weighted third quartile
#> 4.0.1 4.1.0 4.1.1 6.0.1 6.1.0 8.0.0 8.0.1 #> 26.0 22.8 30.4 21.0 19.2 16.4 15.8
fnth(mpg, 0.75, g, TRA = "-") # Groupwise subtract third quartile
#> [1] 0.00 0.00 -7.60 1.10 1.85 -2.20 -2.55 1.60 0.00 -1.10 -2.50 -0.45 #> [13] 0.45 -1.65 -6.45 -6.45 -2.15 2.00 0.00 3.50 -1.30 -1.35 -1.65 -3.55 #> [25] 2.35 -3.10 0.00 0.00 0.40 -1.30 -0.40 -9.00
fnth(mpg, 0.75, g, mtcars$hp, "-") # Groupwise subtract weighted third quartile
#> [1] 0.0 0.0 -7.6 2.2 2.3 -1.1 -2.1 1.6 0.0 0.0 -1.4 0.0 0.9 -1.2 -6.0 #> [16] -6.0 -1.7 2.0 0.0 3.5 -1.3 -0.9 -1.2 -3.1 2.8 -3.1 0.0 0.0 0.0 -1.3 #> [31] -0.8 -9.0
## data.frame method fnth(mtcars, 0.75)
#> mpg cyl disp hp drat wt qsec vs am gear carb #> 22.80 8.00 334.00 180.00 3.92 3.65 18.90 1.00 1.00 4.00 4.00
head(fnth(mtcars, 0.75, TRA = "-"))
#> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 -1.8 -2 -174 -70 -0.02 -1.030 -2.44 -1 0 0 0 #> Mazda RX4 Wag -1.8 -2 -174 -70 -0.02 -0.775 -1.88 -1 0 0 0 #> Datsun 710 0.0 -4 -226 -87 -0.07 -1.330 -0.29 0 0 0 -3 #> Hornet 4 Drive -1.4 -2 -76 -70 -0.84 -0.435 0.54 0 -1 -1 -3 #> Hornet Sportabout -4.1 0 26 -5 -0.77 -0.210 -1.88 -1 -1 -1 -2 #> Valiant -4.7 -2 -109 -75 -1.16 -0.190 1.32 0 -1 -1 -3
fnth(mtcars, 0.75, g)
#> mpg cyl disp hp drat wt qsec vs am gear carb #> 4.0.1 26.00 4 120.3 91.0 4.43 2.14 16.70 0 1 5 2 #> 4.1.0 22.80 4 140.8 95.0 3.70 3.15 20.01 1 0 4 2 #> 4.1.1 30.40 4 95.1 93.0 4.11 2.20 18.90 1 1 4 2 #> 6.0.1 21.00 6 160.0 110.0 3.90 2.77 16.46 0 1 4 4 #> 6.1.0 20.30 6 241.5 123.0 3.92 3.45 19.83 1 0 4 4 #> 8.0.0 16.85 8 420.0 222.5 3.18 4.66 17.71 0 0 3 4 #> 8.0.1 15.40 8 326.0 299.5 3.88 3.37 14.55 0 1 5 6
fnth(fgroup_by(mtcars, cyl, vs, am), 0.75) # Another way of doing it..
#> cyl vs am mpg disp hp drat wt qsec gear carb #> 1 4 0 1 26.00 120.3 91.0 4.43 2.14 16.70 5 2 #> 2 4 1 0 22.80 140.8 95.0 3.70 3.15 20.01 4 2 #> 3 4 1 1 30.40 95.1 93.0 4.11 2.20 18.90 4 2 #> 4 6 0 1 21.00 160.0 110.0 3.90 2.77 16.46 4 4 #> 5 6 1 0 20.30 241.5 123.0 3.92 3.45 19.83 4 4 #> 6 8 0 0 16.85 420.0 222.5 3.18 4.66 17.71 3 4 #> 7 8 0 1 15.40 326.0 299.5 3.88 3.37 14.55 5 6
fnth(mtcars, 0.75, g, use.g.names = FALSE) # No row-names generated
#> mpg cyl disp hp drat wt qsec vs am gear carb #> 1 26.00 4 120.3 91.0 4.43 2.14 16.70 0 1 5 2 #> 2 22.80 4 140.8 95.0 3.70 3.15 20.01 1 0 4 2 #> 3 30.40 4 95.1 93.0 4.11 2.20 18.90 1 1 4 2 #> 4 21.00 6 160.0 110.0 3.90 2.77 16.46 0 1 4 4 #> 5 20.30 6 241.5 123.0 3.92 3.45 19.83 1 0 4 4 #> 6 16.85 8 420.0 222.5 3.18 4.66 17.71 0 0 3 4 #> 7 15.40 8 326.0 299.5 3.88 3.37 14.55 0 1 5 6
## matrix method m <- qM(mtcars) fnth(m, 0.75)
#> mpg cyl disp hp drat wt qsec vs am gear carb #> 22.80 8.00 334.00 180.00 3.92 3.65 18.90 1.00 1.00 4.00 4.00
head(fnth(m, 0.75, TRA = "-"))
#> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 -1.8 -2 -174 -70 -0.02 -1.030 -2.44 -1 0 0 0 #> Mazda RX4 Wag -1.8 -2 -174 -70 -0.02 -0.775 -1.88 -1 0 0 0 #> Datsun 710 0.0 -4 -226 -87 -0.07 -1.330 -0.29 0 0 0 -3 #> Hornet 4 Drive -1.4 -2 -76 -70 -0.84 -0.435 0.54 0 -1 -1 -3 #> Hornet Sportabout -4.1 0 26 -5 -0.77 -0.210 -1.88 -1 -1 -1 -2 #> Valiant -4.7 -2 -109 -75 -1.16 -0.190 1.32 0 -1 -1 -3
fnth(m, 0.75, g) # etc..
#> mpg cyl disp hp drat wt qsec vs am gear carb #> 4.0.1 26.00 4 120.3 91.0 4.43 2.14 16.70 0 1 5 2 #> 4.1.0 22.80 4 140.8 95.0 3.70 3.15 20.01 1 0 4 2 #> 4.1.1 30.40 4 95.1 93.0 4.11 2.20 18.90 1 1 4 2 #> 6.0.1 21.00 6 160.0 110.0 3.90 2.77 16.46 0 1 4 4 #> 6.1.0 20.30 6 241.5 123.0 3.92 3.45 19.83 1 0 4 4 #> 8.0.0 16.85 8 420.0 222.5 3.18 4.66 17.71 0 0 3 4 #> 8.0.1 15.40 8 326.0 299.5 3.88 3.37 14.55 0 1 5 6
library(dplyr) ## grouped_df method mtcars %>% group_by(cyl,vs,am) %>% fnth(0.75)
#> # A tibble: 7 x 11 #> cyl vs am mpg disp hp drat wt qsec gear carb #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 4 0 1 26 120. 91 4.43 2.14 16.7 5 2 #> 2 4 1 0 22.8 141. 95 3.7 3.15 20.0 4 2 #> 3 4 1 1 30.4 95.1 93 4.11 2.2 18.9 4 2 #> 4 6 0 1 21 160 110 3.9 2.77 16.5 4 4 #> 5 6 1 0 20.3 242. 123 3.92 3.45 19.8 4 4 #> 6 8 0 0 16.8 420 222. 3.18 4.66 17.7 3 4 #> 7 8 0 1 15.4 326 300. 3.88 3.37 14.6 5 6
mtcars %>% group_by(cyl,vs,am) %>% fnth(0.75, hp) # Weighted
#> # A tibble: 7 x 11 #> cyl vs am sum.hp mpg disp drat wt qsec gear carb #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 4 0 1 91 26 120. 4.43 2.14 16.7 5 2 #> 2 4 1 0 254 22.8 141. 3.92 3.15 22.9 4 2 #> 3 4 1 1 564 30.4 108 4.11 2.32 18.9 4 2 #> 4 6 0 1 395 21 160 3.9 2.88 17.0 5 6 #> 5 6 1 0 461 19.2 225 3.92 3.44 19.4 4 4 #> 6 8 0 0 2330 16.4 440 3.21 5.25 17.8 3 4 #> 7 8 0 1 599 15.8 351 4.22 3.57 14.6 5 8
mtcars %>% fgroup_by(cyl,vs,am) %>% fnth(0.75) # Faster grouping!
#> cyl vs am mpg disp hp drat wt qsec gear carb #> 1 4 0 1 26.00 120.3 91.0 4.43 2.14 16.70 5 2 #> 2 4 1 0 22.80 140.8 95.0 3.70 3.15 20.01 4 2 #> 3 4 1 1 30.40 95.1 93.0 4.11 2.20 18.90 4 2 #> 4 6 0 1 21.00 160.0 110.0 3.90 2.77 16.46 4 4 #> 5 6 1 0 20.30 241.5 123.0 3.92 3.45 19.83 4 4 #> 6 8 0 0 16.85 420.0 222.5 3.18 4.66 17.71 3 4 #> 7 8 0 1 15.40 326.0 299.5 3.88 3.37 14.55 5 6
mtcars %>% fgroup_by(cyl,vs,am) %>% fnth(0.75, TRA = "/") # Divide by third quartile
#> cyl vs am mpg disp hp drat wt #> Mazda RX4 6 0 1 1.0000000 1.0000000 1.0000000 1.0000000 0.9458484 #> Mazda RX4 Wag 6 0 1 1.0000000 1.0000000 1.0000000 1.0000000 1.0379061 #> Datsun 710 4 1 1 0.7500000 1.1356467 1.0000000 0.9367397 1.0545455 #> Hornet 4 Drive 6 1 0 1.0541872 1.0683230 0.8943089 0.7857143 0.9318841 #> Hornet Sportabout 8 0 0 1.1097923 0.8571429 0.7865169 0.9905660 0.7381974 #> Valiant 6 1 0 0.8916256 0.9316770 0.8536585 0.7040816 1.0028986 #> Duster 360 8 0 0 0.8486647 0.8571429 1.1011236 1.0094340 0.7660944 #> Merc 240D 4 1 0 1.0701754 1.0419034 0.6526316 0.9972973 1.0126984 #> Merc 230 4 1 0 1.0000000 1.0000000 1.0000000 1.0594595 1.0000000 #> Merc 280 6 1 0 0.9458128 0.6939959 1.0000000 1.0000000 0.9971014 #> Merc 280C 6 1 0 0.8768473 0.6939959 1.0000000 1.0000000 0.9971014 #> Merc 450SE 8 0 0 0.9732938 0.6566667 0.8089888 0.9654088 0.8733906 #> Merc 450SL 8 0 0 1.0267062 0.6566667 0.8089888 0.9654088 0.8004292 #> Merc 450SLC 8 0 0 0.9020772 0.6566667 0.8089888 0.9654088 0.8111588 #> Cadillac Fleetwood 8 0 0 0.6172107 1.1238095 0.9213483 0.9213836 1.1266094 #> Lincoln Continental 8 0 0 0.6172107 1.0952381 0.9662921 0.9433962 1.1639485 #> Chrysler Imperial 8 0 0 0.8724036 1.0476190 1.0337079 1.0157233 1.1469957 #> Fiat 128 4 1 1 1.0657895 0.8275499 0.7096774 0.9927007 1.0000000 #> Honda Civic 4 1 1 1.0000000 0.7960042 0.5591398 1.1995134 0.7340909 #> Toyota Corolla 4 1 1 1.1151316 0.7476341 0.6989247 1.0267640 0.8340909 #> Toyota Corona 4 1 0 0.9429825 0.8529830 1.0210526 1.0000000 0.7825397 #> Dodge Challenger 8 0 0 0.9198813 0.7571429 0.6741573 0.8679245 0.7553648 #> AMC Javelin 8 0 0 0.9020772 0.7238095 0.6741573 0.9905660 0.7371245 #> Camaro Z28 8 0 0 0.7893175 0.8333333 1.1011236 1.1729560 0.8240343 #> Pontiac Firebird 8 0 0 1.1394659 0.9523810 0.7865169 0.9685535 0.8251073 #> Fiat X1-9 4 1 1 0.8980263 0.8307045 0.7096774 0.9927007 0.8795455 #> Porsche 914-2 4 0 1 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 #> Lotus Europa 4 1 1 1.0000000 1.0000000 1.2150538 0.9172749 0.6877273 #> Ford Pantera L 8 0 1 1.0259740 1.0766871 0.8814691 1.0876289 0.9406528 #> Ferrari Dino 6 0 1 0.9380952 0.9062500 1.5909091 0.9282051 1.0000000 #> Maserati Bora 8 0 1 0.9740260 0.9233129 1.1185309 0.9123711 1.0593472 #> Volvo 142E 4 1 1 0.7039474 1.2723449 1.1720430 1.0000000 1.2636364 #> qsec gear carb #> Mazda RX4 1.0000000 1.00 1.0000000 #> Mazda RX4 Wag 1.0340219 1.00 1.0000000 #> Datsun 710 0.9846561 1.00 0.5000000 #> Hornet 4 Drive 0.9803328 0.75 0.2500000 #> Hornet Sportabout 0.9610390 1.00 0.5000000 #> Valiant 1.0196672 0.75 0.2500000 #> Duster 360 0.8944099 1.00 1.0000000 #> Merc 240D 0.9995002 1.00 1.0000000 #> Merc 230 1.1444278 1.00 1.0000000 #> Merc 280 0.9228442 1.00 1.0000000 #> Merc 280C 0.9531014 1.00 1.0000000 #> Merc 450SE 0.9824958 1.00 0.7500000 #> Merc 450SL 0.9937888 1.00 0.7500000 #> Merc 450SLC 1.0163749 1.00 0.7500000 #> Cadillac Fleetwood 1.0152456 1.00 1.0000000 #> Lincoln Continental 1.0062112 1.00 1.0000000 #> Chrysler Imperial 0.9836251 1.00 1.0000000 #> Fiat 128 1.0301587 1.00 0.5000000 #> Honda Civic 0.9798942 1.00 1.0000000 #> Toyota Corolla 1.0529101 1.00 0.5000000 #> Toyota Corona 1.0000000 0.75 0.5000000 #> Dodge Challenger 0.9525692 1.00 0.5000000 #> AMC Javelin 0.9768492 1.00 0.5000000 #> Camaro Z28 0.8701299 1.00 1.0000000 #> Pontiac Firebird 0.9627329 1.00 0.5000000 #> Fiat X1-9 1.0000000 1.00 0.5000000 #> Porsche 914-2 1.0000000 1.00 1.0000000 #> Lotus Europa 0.8941799 1.25 1.0000000 #> Ford Pantera L 0.9965636 1.00 0.6666667 #> Ferrari Dino 0.9416768 1.25 1.5000000 #> Maserati Bora 1.0034364 1.00 1.3333333 #> Volvo 142E 0.9841270 1.00 1.0000000 #> #> Grouped by: cyl, vs, am [7 | 5 (3.8)]
mtcars %>% fgroup_by(cyl,vs,am) %>% fselect(mpg, hp) %>% # Faster selecting fnth(0.75, hp, "/") # Divide mpg by its third weighted group-quartile, using hp as weights
#> hp mpg #> Mazda RX4 110 1.0000000 #> Mazda RX4 Wag 110 1.0000000 #> Datsun 710 93 0.7500000 #> Hornet 4 Drive 110 1.1145833 #> Hornet Sportabout 175 1.1402439 #> Valiant 105 0.9427083 #> Duster 360 245 0.8719512 #> Merc 240D 62 1.0701754 #> Merc 230 95 1.0000000 #> Merc 280 123 1.0000000 #> Merc 280C 123 0.9270833 #> Merc 450SE 180 1.0000000 #> Merc 450SL 180 1.0548780 #> Merc 450SLC 180 0.9268293 #> Cadillac Fleetwood 205 0.6341463 #> Lincoln Continental 215 0.6341463 #> Chrysler Imperial 230 0.8963415 #> Fiat 128 66 1.0657895 #> Honda Civic 52 1.0000000 #> Toyota Corolla 65 1.1151316 #> Toyota Corona 97 0.9429825 #> Dodge Challenger 150 0.9451220 #> AMC Javelin 150 0.9268293 #> Camaro Z28 245 0.8109756 #> Pontiac Firebird 175 1.1707317 #> Fiat X1-9 66 0.8980263 #> Porsche 914-2 91 1.0000000 #> Lotus Europa 113 1.0000000 #> Ford Pantera L 264 1.0000000 #> Ferrari Dino 175 0.9380952 #> Maserati Bora 335 0.9493671 #> Volvo 142E 109 0.7039474 #> #> Grouped by: cyl, vs, am [7 | 5 (3.8)]