fmode is a generic function and returns the (column-wise) statistical mode i.e. the most frequent value of x, (optionally) grouped by g and/or weighted by w. The TRA argument can further be used to transform x using its (grouped, weighted) mode. Ties between multiple possible modes can be resolved by taking the minimum, maximum or (default) first occurring mode.

fmode(x, ...)

# S3 method for default
fmode(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
      use.g.names = TRUE, ties = "first", ...)

# S3 method for matrix
fmode(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
      use.g.names = TRUE, drop = TRUE, ties = "first", ...)

# S3 method for data.frame
fmode(x, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE,
      use.g.names = TRUE, drop = TRUE, ties = "first", ...)

# S3 method for grouped_df
fmode(x, w = NULL, TRA = NULL, na.rm = TRUE,
      use.g.names = FALSE, keep.group_vars = TRUE, keep.w = TRUE,
      ties = "first", ...)

Arguments

x

a vector, matrix, data frame or grouped data frame (class 'grouped_df').

g

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x.

w

a numeric vector of (non-negative) weights, may contain missing values.

TRA

an integer or quoted operator indicating the transformation to perform: 1 - "replace_fill" | 2 - "replace" | 3 - "-" | 4 - "-+" | 5 - "/" | 6 - "%" | 7 - "+" | 8 - "*" | 9 - "%%" | 10 - "-%%". See TRA.

na.rm

logical. Skip missing values in x. Defaults to TRUE and implemented at very little computational cost. If na.rm = FALSE, NA is treated as any other value.

use.g.names

logical. Make group-names and add to the result as names (default method) or row-names (matrix and data frame methods). No row-names are generated for data.table's.

ties

an integer or character string specifying the method to resolve ties between multiple possible modes i.e. multiple values with the maximum frequency or sum of weights:

Int. String Description
1"first"take the first occurring mode.
2"min"take the smallest of the possible modes.
3"max"take the largest of the possible modes.

drop

matrix and data.frame method: Logical. TRUE drops dimensions and returns an atomic vector if g = NULL and TRA = NULL.

keep.group_vars

grouped_df method: Logical. FALSE removes grouping variables after computation.

keep.w

grouped_df method: Logical. Retain sum of weighting variable after computation (if contained in grouped_df).

...

arguments to be passed to or from other methods.

Details

fmode implements a pretty fast algorithm to find the statistical mode utilizing index- hashing implemented in the Rcpp::sugar::IndexHash class.

If na.rm = FALSE, NA is not removed but treated as any other value (i.e. it's frequency is counted). If all values are NA, NA is always returned.

The weighted mode is computed by summing up the weights for all distinct values and choosing the value with the largest sum. If na.rm = TRUE, missing values will be removed from both x and w i.e. utilizing only x[complete.cases(x,w)] and w[complete.cases(x,w)].

It is possible that multiple values have the same mode that is the maximum frequency or sum of weights. Typical cases are simply when all values are either all the same or all distinct. In such cases, the default option ties = "first" returns the first occurring value in the data reaching the maximum frequency count or sum of weights. For example in a sample x = c(1, 3, 2, 2, 4, 4, 1, 7), the first mode is 2 as fmode goes through the data from left to right. It is also possible to take the minimum or maximum mode, i.e. fmode(x, ties = "min") returns 1, and fmode(x, ties = "max") returns 4. It should be noted that options ties = "min" and ties = "max" work well for numeric/integer/factor data as well as date/time variables, but give unintuitive results for character data (no strict alphabetic sorting, similar to using < and > to compare character values in R). These options are also best avoided if missing values are counted (na.rm = FALSE) since no proper logical comparison with missing values is possible: With numeric data it depends, since in C++ any comparison with NA_real_ evaluates to FALSE, NA_real_ is chosen as the min or max mode only if it is also the first mode, and never otherwise. For integer data, NA_integer_ is stored as the smallest integer in C++, so it will always be chosen as the min mode and never as the max mode. For character data, NA_character_ is stored as the string "NA" in C++ and thus the behavior depends on the other character content.

This all seamlessly generalizes to grouped computations, which are performed by mapping the data to a sparse-array and then going group-by group.

fmode preserves all the attributes of the objects it is applied to (apart from names or row-names which are adjusted as necessary in grouped operations). If a data frame is passed to fmode and drop = TRUE (the default), unlist will be called on the result, which might not be sensible depending on the data at hand.

Value

The (w weighted) statistical mode of x, grouped by g, or (if TRA is used) x transformed by its mode, grouped by g. See also Details.

See also

Examples

x <- c(1, 3, 2, 2, 4, 4, 1, 7, NA, NA, NA) fmode(x) # Default is ties = "first"
#> [1] 2
fmode(x, ties = "min")
#> [1] 1
fmode(x, ties = "max")
#> [1] 4
fmode(x, na.rm = FALSE) # Here NA is the mode, regardless of ties option
#> [1] NA
fmode(x[-length(x)], na.rm = FALSE) # Not anymore..
#> [1] 2
## World Development Data attach(wlddev) ## default vector method fmode(PCGDP) # Numeric mode
#> [1] 339.6333 #> attr(,"label") #> [1] "GDP per capita (constant 2010 US$)"
head(fmode(PCGDP, iso3c)) # Grouped numeric mode
#> ABW AFG AGO ALB AND ARE #> 15669.6160 339.6333 2969.9604 2037.1227 42137.5190 102479.1890
head(fmode(PCGDP, iso3c, LIFEEX)) # Grouped and weighted numeric mode
#> ABW AFG AGO ALB AND ARE #> 24288.9871 583.0551 3533.8652 4683.5192 NA 41449.6814
fmode(region) # Factor mode
#> [1] Europe & Central Asia #> attr(,"label") #> [1] Region #> 7 Levels: East Asia & Pacific ... Sub-Saharan Africa
fmode(date) # Date mode (defaults to first value since panel is balanced)
#> [1] "1961-01-01"
fmode(country) # Character mode (also defaults to first value)
#> [1] "Afghanistan" #> attr(,"label") #> [1] "Country Name"
fmode(OECD) # Logical mode
#> [1] FALSE #> attr(,"label") #> [1] "Is OECD Member Country?"
# ..all the above can also be performed grouped and weighted ## matrix method m <- qM(airquality) fmode(m)
#> Ozone Solar.R Wind Temp Month Day #> 23.0 259.0 11.5 81.0 5.0 1.0
fmode(m, na.rm = FALSE) # NA frequency is also counted
#> Ozone Solar.R Wind Temp Month Day #> NA NA 11.5 81.0 5.0 1.0
fmode(m, airquality$Month) # Groupwise
#> Ozone Solar.R Wind Temp Month Day #> 5 11 190 9.7 66 5 1 #> 6 29 250 11.5 76 6 1 #> 7 97 175 7.4 81 7 1 #> 8 44 255 11.5 86 8 1 #> 9 13 238 10.3 71 9 1
fmode(m, w = airquality$Day) # Weighted: Later days in the month are given more weight
#> Ozone Solar.R Wind Temp Month Day #> 23.0 223.0 11.5 76.0 5.0 30.0
fmode(m>50, airquality$Month) # Groupwise logical mode
#> Ozone Solar.R Wind Temp Month Day #> 5 FALSE TRUE FALSE TRUE FALSE FALSE #> 6 FALSE TRUE FALSE TRUE FALSE FALSE #> 7 TRUE TRUE FALSE TRUE FALSE FALSE #> 8 TRUE TRUE FALSE TRUE FALSE FALSE #> 9 FALSE TRUE FALSE TRUE FALSE FALSE
# etc.. ## data.frame method fmode(wlddev) # Calling unlist -> coerce to character vector
#> country iso3c date year #> "Afghanistan" "2" "-3287" "1960" #> decade region income OECD #> "1980" "2" "1" "FALSE" #> PCGDP LIFEEX GINI ODA #> "339.633338545264" "62.869" "29" "80000"
fmode(wlddev, drop = FALSE) # Gives one row
#> country iso3c date year decade region income #> 1 Afghanistan AFG 1961-01-01 1960 1980 Europe & Central Asia High income #> OECD PCGDP LIFEEX GINI ODA #> 1 FALSE 339.6333 62.869 29 80000
head(fmode(wlddev, iso3c)) # Grouped mode
#> country iso3c date year decade #> ABW Aruba ABW 1961-01-01 1960 1980 #> AFG Afghanistan AFG 1961-01-01 1960 1980 #> AGO Angola AGO 1961-01-01 1960 1980 #> ALB Albania ALB 1961-01-01 1960 1980 #> AND Andorra AND 1961-01-01 1960 1980 #> ARE United Arab Emirates ARE 1961-01-01 1960 1980 #> region income OECD PCGDP LIFEEX #> ABW Latin America & Caribbean High income FALSE 15669.6160 65.662 #> AFG South Asia Low income FALSE 339.6333 32.292 #> AGO Sub-Saharan Africa Lower middle income FALSE 2969.9604 33.251 #> ALB Europe & Central Asia Upper middle income FALSE 2037.1227 71.860 #> AND Europe & Central Asia High income FALSE 42137.5190 NA #> ARE Middle East & North Africa High income FALSE 102479.1890 52.265 #> GINI ODA #> ABW NA 33630000 #> AFG NA 114440000 #> AGO 52 -380000 #> ALB 27 8550000 #> AND NA NA #> ARE NA 120000
head(fmode(wlddev, iso3c, LIFEEX)) # Grouped and weighted mode
#> country iso3c date year decade #> ABW Aruba ABW 2017-01-01 2016 2000 #> AFG Afghanistan AFG 2017-01-01 2016 2000 #> AGO Angola AGO 2017-01-01 2016 2000 #> ALB Albania ALB 2017-01-01 2016 2000 #> AND Andorra AND 2019-01-01 2018 2020 #> ARE United Arab Emirates ARE 2017-01-01 2016 2000 #> region income OECD PCGDP LIFEEX GINI #> ABW Latin America & Caribbean High income FALSE 24288.9871 75.867 NA #> AFG South Asia Low income FALSE 583.0551 63.673 NA #> AGO Sub-Saharan Africa Lower middle income FALSE 3533.8652 61.547 42.7 #> ALB Europe & Central Asia Upper middle income FALSE 4683.5192 71.860 29.0 #> AND Europe & Central Asia High income NA NA NA NA #> ARE Middle East & North Africa High income FALSE 41449.6814 77.256 NA #> ODA #> ABW -11660000 #> AFG 4069210000 #> AGO -380000 #> ALB 168600000 #> AND NA #> ARE 4270000
detach(wlddev)