fnth (column-wise) returns the n'th smallest element from a set of unsorted elements x corresponding to an integer index (n), or to a probability between 0 and 1. If n is passed as a probability, ties can be resolved using the lower, upper, or (default) average of the possible elements. These are discontinuous and fast methods to estimate a sample quantile.

fnth(x, n = 0.5, ...)

# S3 method for default
fnth(x, n = 0.5, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
     use.g.names = TRUE, ties = "mean", nthreads = 1L, ...)

# S3 method for matrix
fnth(x, n = 0.5, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
     use.g.names = TRUE, drop = TRUE, ties = "mean", nthreads = 1L, ...)

# S3 method for data.frame
fnth(x, n = 0.5, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
     use.g.names = TRUE, drop = TRUE, ties = "mean", nthreads = 1L, ...)

# S3 method for grouped_df
fnth(x, n = 0.5, w = NULL, TRA = NULL, na.rm = TRUE,
     use.g.names = FALSE, keep.group_vars = TRUE, keep.w = TRUE,
     ties = "mean", nthreads = 1L, ...)

Arguments

x

a numeric vector, matrix, data frame or grouped data frame (class 'grouped_df').

n

the element to return using a single integer index such that 1 < n < NROW(x), or a probability 0 < n < 1. See Details.

g

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x.

w

a numeric vector of (non-negative) weights, may contain missing values.

TRA

an integer or quoted operator indicating the transformation to perform: 0 - "replace_NA" | 1 - "replace_fill" | 2 - "replace" | 3 - "-" | 4 - "-+" | 5 - "/" | 6 - "%" | 7 - "+" | 8 - "*" | 9 - "%%" | 10 - "-%%". See TRA.

na.rm

logical. Skip missing values in x. Defaults to TRUE and implemented at very little computational cost. If na.rm = FALSE a NA is returned when encountered.

use.g.names

logical. Make group-names and add to the result as names (default method) or row-names (matrix and data frame methods). No row-names are generated for data.table's.

nthreads

integer. The number of threads to utilize. Parallelism is across groups for grouped computations and at the column-level otherwise. No parallelism is available for weighted computations.

ties

an integer or character string specifying the method to resolve ties between adjacent qualifying elements:

Int. String Description
1"mean"take the arithmetic mean of all qualifying elements.
2"min"take the smallest of the elements.
3"max"take the largest of the elements.

drop

matrix and data.frame method: Logical. TRUE drops dimensions and returns an atomic vector if g = NULL and TRA = NULL.

keep.group_vars

grouped_df method: Logical. FALSE removes grouping variables after computation.

keep.w

grouped_df method: Logical. Retain sum of weighting variable after computation (if contained in grouped_df).

...

arguments to be passed to or from other methods. If TRA is used, passing set = TRUE will transform data by reference and return the result invisibly.

Details

This is an R port to std::nth_element, an efficient partial sorting algorithm in C++. It is also used to calculated the median (in fact the default fnth(x, n = 0.5) is identical to fmedian(x), so see also the details for fmedian).

fnth generalizes the principles of median value calculation to find arbitrary elements. It offers considerable flexibility by providing both simple order statistics and simple discontinuous quantile estimation. Regarding the former, setting n to an index between 1 and NROW(x) will return the n'th smallest element of x, about 2x faster than sort(x, partial = n)[n]. As to the latter, setting n to a probability between 0 and 1 will return the corresponding element of x, and resolve ties between multiple qualifying elements (such as when n = 0.5 and x is even) using the arithmetic average ties = "mean", or the smallest ties = "min" or largest ties = "max" of those elements.

If n > 1 is used and x contains missing values (and na.rm = TRUE, otherwise NA is returned), n is internally converted to a probability using p = (n-1)/(NROW(x)-1), and that probability is applied to the set of complete elements (of each column if x is a matrix or data frame) to find the as.integer(p*(fnobs(x)-1))+1L'th element (which corresponds to option ties = "min"). Note that it is necessary to subtract and add 1 so that n = 1 corresponds to p = 0 and n = NROW(x) to p = 1.

When using grouped computations (supplying a vector or list to g subdividing x) and n > 1 is used, it is transformed to a probability p = (n-1)/(NROW(x)/ng-1) (where ng contains the number of unique groups in g) and ties = "min" is used to sort out clashes. This could be useful for example to return the n'th smallest element of each group in a balanced panel, but with unequal group sizes it more intuitive to pass a probability to n.

If weights are used, the same principles apply as for weighted median calculation: A target partial sum of weights p*sum(w) is calculated, and the weighted n'th element is the element k such that all elements smaller than k have a sum of weights <= p*sum(w), and all elements larger than k have a sum of weights <= (1 - p)*sum(w). If the partial-sum of weights (p*sum(w)) is reached exactly for some element k, then (summing from the lower end) both k and k+1 would qualify as the weighted n'th element (and some possible additional elements with zero weights following k would also qualify). If n > 1, the lowest of those elements is chosen (congruent with the unweighted behavior), but if 0 < n < 1, the ties option regulates how to resolve such conflicts, yielding lower-weighted, upper-weighted or (default) average weighted n'th elements.

The weighted n'th element is computed using radixorder to first obtain an ordering of all elements, so it is considerably more computationally expensive than the unweighted version. With groups, the entire vector is also ordered, and the weighted n'th element is computed in a single ordered pass through the data (after calculating partial-group sums of the weights, skipping weights for which x is missing).

If x is a matrix or data frame, these computations are performed independently for each column. Column-attributes and overall attributes of a data frame are preserved (if g is used or drop = FALSE).

Value

The (w weighted) n'th element of x, grouped by g, or (if TRA is used) x transformed by its (grouped, weighted) n'th element.

Examples

## default vector method
mpg <- mtcars$mpg
fnth(mpg)                         # Simple nth element: Median (same as fmedian(mpg))
#> [1] 19.2
fnth(mpg, 5)                      # 5th smallest element
#> [1] 14.7
sort(mpg, partial = 5)[5]         # Same using base R, fnth is 2x faster.
#> [1] 14.7
fnth(mpg, 0.75)                   # Third quartile
#> [1] 22.8
fnth(mpg, 0.75, w = mtcars$hp)    # Weighted third quartile: Weighted by hp
#> [1] 21
fnth(mpg, 0.75, TRA = "-")        # Simple transformation: Subtract third quartile
#>  [1]  -1.8  -1.8   0.0  -1.4  -4.1  -4.7  -8.5   1.6   0.0  -3.6  -5.0  -6.4
#> [13]  -5.5  -7.6 -12.4 -12.4  -8.1   9.6   7.6  11.1  -1.3  -7.3  -7.6  -9.5
#> [25]  -3.6   4.5   3.2   7.6  -7.0  -3.1  -7.8  -1.4
fnth(mpg, 0.75, mtcars$cyl)             # Grouped third quartile
#>    4    6    8 
#> 30.4 21.0 16.1 
fnth(mpg, 0.75, mtcars[c(2,8:9)])       # More groups..
#> 4.0.1 4.1.0 4.1.1 6.0.1 6.1.0 8.0.0 8.0.1 
#> 26.00 22.80 30.40 21.00 20.30 16.85 15.40 
g <- GRP(mtcars, ~ cyl + vs + am)       # Precomputing groups gives more speed !
fnth(mpg, 0.75, g)
#> 4.0.1 4.1.0 4.1.1 6.0.1 6.1.0 8.0.0 8.0.1 
#> 26.00 22.80 30.40 21.00 20.30 16.85 15.40 
fnth(mpg, 0.75, g, mtcars$hp)           # Grouped weighted third quartile
#> 4.0.1 4.1.0 4.1.1 6.0.1 6.1.0 8.0.0 8.0.1 
#>  26.0  22.8  30.4  21.0  19.2  16.4  15.8 
fnth(mpg, 0.75, g, TRA = "-")           # Groupwise subtract third quartile
#>  [1]  0.00  0.00 -7.60  1.10  1.85 -2.20 -2.55  1.60  0.00 -1.10 -2.50 -0.45
#> [13]  0.45 -1.65 -6.45 -6.45 -2.15  2.00  0.00  3.50 -1.30 -1.35 -1.65 -3.55
#> [25]  2.35 -3.10  0.00  0.00  0.40 -1.30 -0.40 -9.00
fnth(mpg, 0.75, g, mtcars$hp, "-")      # Groupwise subtract weighted third quartile
#>  [1]  0.0  0.0 -7.6  2.2  2.3 -1.1 -2.1  1.6  0.0  0.0 -1.4  0.0  0.9 -1.2 -6.0
#> [16] -6.0 -1.7  2.0  0.0  3.5 -1.3 -0.9 -1.2 -3.1  2.8 -3.1  0.0  0.0  0.0 -1.3
#> [31] -0.8 -9.0

## data.frame method
fnth(mtcars, 0.75)
#>    mpg    cyl   disp     hp   drat     wt   qsec     vs     am   gear   carb 
#>  22.80   8.00 334.00 180.00   3.92   3.65  18.90   1.00   1.00   4.00   4.00 
head(fnth(mtcars, 0.75, TRA = "-"))
#>                    mpg cyl disp  hp  drat     wt  qsec vs am gear carb
#> Mazda RX4         -1.8  -2 -174 -70 -0.02 -1.030 -2.44 -1  0    0    0
#> Mazda RX4 Wag     -1.8  -2 -174 -70 -0.02 -0.775 -1.88 -1  0    0    0
#> Datsun 710         0.0  -4 -226 -87 -0.07 -1.330 -0.29  0  0    0   -3
#> Hornet 4 Drive    -1.4  -2  -76 -70 -0.84 -0.435  0.54  0 -1   -1   -3
#> Hornet Sportabout -4.1   0   26  -5 -0.77 -0.210 -1.88 -1 -1   -1   -2
#> Valiant           -4.7  -2 -109 -75 -1.16 -0.190  1.32  0 -1   -1   -3
fnth(mtcars, 0.75, g)
#>         mpg cyl  disp    hp drat   wt  qsec vs am gear carb
#> 4.0.1 26.00   4 120.3  91.0 4.43 2.14 16.70  0  1    5    2
#> 4.1.0 22.80   4 140.8  95.0 3.70 3.15 20.01  1  0    4    2
#> 4.1.1 30.40   4  95.1  93.0 4.11 2.20 18.90  1  1    4    2
#> 6.0.1 21.00   6 160.0 110.0 3.90 2.77 16.46  0  1    4    4
#> 6.1.0 20.30   6 241.5 123.0 3.92 3.45 19.83  1  0    4    4
#> 8.0.0 16.85   8 420.0 222.5 3.18 4.66 17.71  0  0    3    4
#>  [ reached 'max' / getOption("max.print") -- omitted 1 rows ]
fnth(fgroup_by(mtcars, cyl, vs, am), 0.75)   # Another way of doing it..
#>   cyl vs am   mpg  disp    hp drat   wt  qsec gear carb
#> 1   4  0  1 26.00 120.3  91.0 4.43 2.14 16.70    5    2
#> 2   4  1  0 22.80 140.8  95.0 3.70 3.15 20.01    4    2
#> 3   4  1  1 30.40  95.1  93.0 4.11 2.20 18.90    4    2
#> 4   6  0  1 21.00 160.0 110.0 3.90 2.77 16.46    4    4
#> 5   6  1  0 20.30 241.5 123.0 3.92 3.45 19.83    4    4
#> 6   8  0  0 16.85 420.0 222.5 3.18 4.66 17.71    3    4
#>  [ reached 'max' / getOption("max.print") -- omitted 1 rows ]
fnth(mtcars, 0.75, g, use.g.names = FALSE)   # No row-names generated
#>     mpg cyl  disp    hp drat   wt  qsec vs am gear carb
#> 1 26.00   4 120.3  91.0 4.43 2.14 16.70  0  1    5    2
#> 2 22.80   4 140.8  95.0 3.70 3.15 20.01  1  0    4    2
#> 3 30.40   4  95.1  93.0 4.11 2.20 18.90  1  1    4    2
#> 4 21.00   6 160.0 110.0 3.90 2.77 16.46  0  1    4    4
#> 5 20.30   6 241.5 123.0 3.92 3.45 19.83  1  0    4    4
#> 6 16.85   8 420.0 222.5 3.18 4.66 17.71  0  0    3    4
#>  [ reached 'max' / getOption("max.print") -- omitted 1 rows ]

## matrix method
m <- qM(mtcars)
fnth(m, 0.75)
#>    mpg    cyl   disp     hp   drat     wt   qsec     vs     am   gear   carb 
#>  22.80   8.00 334.00 180.00   3.92   3.65  18.90   1.00   1.00   4.00   4.00 
head(fnth(m, 0.75, TRA = "-"))
#>                    mpg cyl disp  hp  drat     wt  qsec vs am gear carb
#> Mazda RX4         -1.8  -2 -174 -70 -0.02 -1.030 -2.44 -1  0    0    0
#> Mazda RX4 Wag     -1.8  -2 -174 -70 -0.02 -0.775 -1.88 -1  0    0    0
#> Datsun 710         0.0  -4 -226 -87 -0.07 -1.330 -0.29  0  0    0   -3
#> Hornet 4 Drive    -1.4  -2  -76 -70 -0.84 -0.435  0.54  0 -1   -1   -3
#> Hornet Sportabout -4.1   0   26  -5 -0.77 -0.210 -1.88 -1 -1   -1   -2
#> Valiant           -4.7  -2 -109 -75 -1.16 -0.190  1.32  0 -1   -1   -3
fnth(m, 0.75, g) # etc..
#>         mpg cyl  disp    hp drat   wt  qsec vs am gear carb
#> 4.0.1 26.00   4 120.3  91.0 4.43 2.14 16.70  0  1    5    2
#> 4.1.0 22.80   4 140.8  95.0 3.70 3.15 20.01  1  0    4    2
#> 4.1.1 30.40   4  95.1  93.0 4.11 2.20 18.90  1  1    4    2
#> 6.0.1 21.00   6 160.0 110.0 3.90 2.77 16.46  0  1    4    4
#> 6.1.0 20.30   6 241.5 123.0 3.92 3.45 19.83  1  0    4    4
#> 8.0.0 16.85   8 420.0 222.5 3.18 4.66 17.71  0  0    3    4
#>  [ reached getOption("max.print") -- omitted 1 row ]
 
library(dplyr)
## grouped_df method
mtcars %>% group_by(cyl,vs,am) %>% fnth(0.75)
#> # A tibble: 7 × 11
#>     cyl    vs    am   mpg  disp    hp  drat    wt  qsec  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     4     0     1  26   120.    91   4.43  2.14  16.7     5     2
#> 2     4     1     0  22.8 141.    95   3.7   3.15  20.0     4     2
#> 3     4     1     1  30.4  95.1   93   4.11  2.2   18.9     4     2
#> 4     6     0     1  21   160    110   3.9   2.77  16.5     4     4
#> 5     6     1     0  20.3 242.   123   3.92  3.45  19.8     4     4
#> 6     8     0     0  16.8 420    222.  3.18  4.66  17.7     3     4
#> 7     8     0     1  15.4 326    300.  3.88  3.37  14.6     5     6
mtcars %>% group_by(cyl,vs,am) %>% fnth(0.75, hp)           # Weighted
#> # A tibble: 7 × 11
#>     cyl    vs    am sum.hp   mpg  disp  drat    wt  qsec  gear  carb
#>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     4     0     1     91  26    120.  4.43  2.14  16.7     5     2
#> 2     4     1     0    254  22.8  141.  3.92  3.15  22.9     4     2
#> 3     4     1     1    564  30.4  108   4.11  2.32  18.9     4     2
#> 4     6     0     1    395  21    160   3.9   2.88  17.0     5     6
#> 5     6     1     0    461  19.2  225   3.92  3.44  19.4     4     4
#> 6     8     0     0   2330  16.4  440   3.21  5.25  17.8     3     4
#> 7     8     0     1    599  15.8  351   4.22  3.57  14.6     5     8
mtcars %>% fgroup_by(cyl,vs,am) %>% fnth(0.75)              # Faster grouping!
#>   cyl vs am   mpg  disp    hp drat   wt  qsec gear carb
#> 1   4  0  1 26.00 120.3  91.0 4.43 2.14 16.70    5    2
#> 2   4  1  0 22.80 140.8  95.0 3.70 3.15 20.01    4    2
#> 3   4  1  1 30.40  95.1  93.0 4.11 2.20 18.90    4    2
#> 4   6  0  1 21.00 160.0 110.0 3.90 2.77 16.46    4    4
#> 5   6  1  0 20.30 241.5 123.0 3.92 3.45 19.83    4    4
#> 6   8  0  0 16.85 420.0 222.5 3.18 4.66 17.71    3    4
#>  [ reached 'max' / getOption("max.print") -- omitted 1 rows ]
mtcars %>% fgroup_by(cyl,vs,am) %>% fnth(0.75, TRA = "/")   # Divide by third quartile
#>                   cyl vs am       mpg      disp        hp      drat        wt
#> Mazda RX4           6  0  1 1.0000000 1.0000000 1.0000000 1.0000000 0.9458484
#> Mazda RX4 Wag       6  0  1 1.0000000 1.0000000 1.0000000 1.0000000 1.0379061
#> Datsun 710          4  1  1 0.7500000 1.1356467 1.0000000 0.9367397 1.0545455
#> Hornet 4 Drive      6  1  0 1.0541872 1.0683230 0.8943089 0.7857143 0.9318841
#> Hornet Sportabout   8  0  0 1.1097923 0.8571429 0.7865169 0.9905660 0.7381974
#> Valiant             6  1  0 0.8916256 0.9316770 0.8536585 0.7040816 1.0028986
#>                        qsec gear carb
#> Mazda RX4         1.0000000 1.00 1.00
#> Mazda RX4 Wag     1.0340219 1.00 1.00
#> Datsun 710        0.9846561 1.00 0.50
#> Hornet 4 Drive    0.9803328 0.75 0.25
#> Hornet Sportabout 0.9610390 1.00 0.50
#> Valiant           1.0196672 0.75 0.25
#>  [ reached 'max' / getOption("max.print") -- omitted 26 rows ]
#> 
#> Grouped by:  cyl, vs, am  [7 | 5 (3.8) 1-12] 
mtcars %>% fgroup_by(cyl,vs,am) %>% fselect(mpg, hp) %>%     # Faster selecting
      fnth(0.75, hp, "/")  # Divide mpg by its third weighted group-quartile, using hp as weights
#>                      hp       mpg
#> Mazda RX4           110 1.0000000
#> Mazda RX4 Wag       110 1.0000000
#> Datsun 710           93 0.7500000
#> Hornet 4 Drive      110 1.1145833
#> Hornet Sportabout   175 1.1402439
#> Valiant             105 0.9427083
#> Duster 360          245 0.8719512
#> Merc 240D            62 1.0701754
#> Merc 230             95 1.0000000
#> Merc 280            123 1.0000000
#> Merc 280C           123 0.9270833
#> Merc 450SE          180 1.0000000
#> Merc 450SL          180 1.0548780
#> Merc 450SLC         180 0.9268293
#> Cadillac Fleetwood  205 0.6341463
#> Lincoln Continental 215 0.6341463
#> Chrysler Imperial   230 0.8963415
#> Fiat 128             66 1.0657895
#> Honda Civic          52 1.0000000
#> Toyota Corolla       65 1.1151316
#> Toyota Corona        97 0.9429825
#> Dodge Challenger    150 0.9451220
#> AMC Javelin         150 0.9268293
#> Camaro Z28          245 0.8109756
#> Pontiac Firebird    175 1.1707317
#> Fiat X1-9            66 0.8980263
#> Porsche 914-2        91 1.0000000
#> Lotus Europa        113 1.0000000
#> Ford Pantera L      264 1.0000000
#> Ferrari Dino        175 0.9380952
#> Maserati Bora       335 0.9493671
#> Volvo 142E          109 0.7039474
#> 
#> Grouped by:  cyl, vs, am  [7 | 5 (3.8) 1-12]