collap is a fast and easy to use multi-purpose data aggregation command.

It performs simple aggregations, multi-type data aggregations applying different functions to numeric and categorical data, weighted aggregations (including weighted multi-type aggregations), multi-function aggregations applying multiple functions to each column, and fully customized aggregations where the user passes a list mapping functions to columns.

collap works with collapse's Fast Statistical Functions, providing extremely fast conventional and weighted aggregation. It also works with other functions but this does not deliver high speeds on large data and does not support weighted aggregations.

# Main function: allows formula and data input to `by` and `w` arguments
collap(X, by, FUN = fmean, catFUN = fmode, cols = NULL, w = NULL, wFUN = fsum,
       custom = NULL, keep.by = TRUE, keep.w = TRUE, keep.col.order = TRUE,
       sort = TRUE, decreasing = FALSE, na.last = TRUE, parallel = FALSE, mc.cores = 2L,
       return = c("wide","list","long","long_dupl"), give.names = "auto", sort.row, ...)

# Programmer function: allows column names and indices input to `by` and `w` arguments
collapv(X, by, FUN = fmean, catFUN = fmode, cols = NULL, w = NULL, wFUN = fsum,
        custom = NULL, keep.by = TRUE, keep.w = TRUE, keep.col.order = TRUE,
        sort = TRUE, decreasing = FALSE, na.last = TRUE, parallel = FALSE, mc.cores = 2L,
        return = c("wide","list","long","long_dupl"), give.names = "auto", sort.row, ...)

# Auxiliary function: for grouped data ('grouped_df') input + non-standard evaluation
collapg(X, FUN = fmean, catFUN = fmode, cols = NULL, w = NULL, wFUN = fsum, custom = NULL,
        keep.group_vars = TRUE, keep.w = TRUE, keep.col.order = TRUE,
        parallel = FALSE, mc.cores = 2L,
        return = c("wide","list","long","long_dupl"), give.names = "auto", sort.row, ...)

Arguments

X

a data frame, or an object coercible to data frame using qDF.

by

for collap: a one-or two sided formula, i.e. ~ group1 or var1 + var2 ~ group1 + group2, or a atomic vector, list of vectors or GRP object used to group X. For collapv: names or indices of grouping columns, or a logical vector or selector function such as is.categorical selecting grouping columns.

FUN

a function, list of functions (i.e. list(fsum, fmean, fsd) or list(myfun1 = function(x).., sd = sd)), or a character vector of function names, which are automatically applied only to numeric variables.

catFUN

same as FUN, but applied only to categorical (non-numeric) typed columns (is.categorical).

cols

select columns to aggregate using a function, column names, indices or logical vector. Note: cols is ignored if a two-sided formula is passed to by.

w

weights. Can be passed as numeric vector or alternatively as formula i.e. ~ weightvar in collap or column name / index etc. i.e. "weightvar" in collapv. collapg supports non-standard evaluations so weightvar can be indicated without quotes if found in X.

wFUN

same as FUN: Function(s) to aggregate weight variable if keep.w = TRUE. By default the sum of the weights is computed in each group.

custom

a named list specifying a fully customized aggregation task. The names of the list are function names and the content columns to aggregate using this function (same input as cols). For example custom = list(fmean = 1:6, fsd = 7:9, fmode = 10:11) tells collap to aggregate columns 1-6 of X using the mean, columns 7-9 using the standard deviation etc. Note: custom lets collap ignore any inputs passed to FUN, catFUN or cols.

keep.by, keep.group_vars

logical. FALSE will omit grouping variables from the output. TRUE keeps the variables, even if passed externally in a list or vector (unlike other collapse functions).

keep.w

logical. FALSE will omit weight variable from the output i.e. no aggregation of the weights. TRUE aggregates and adds weights, even if passed externally as a vector (unlike other collapse functions).

keep.col.order

logical. Retain original column order post-aggregation.

sort, decreasing, na.last

logical. Arguments passed to GRP.default and affecting the row-order in the aggregated data frame.

parallel

logical. Use mclapply instead of lapply to parallelize the computation at the column level. Not available for Windows.

mc.cores

integer. Argument to mclapply setting the number of cores to use, default is 2.

return

character. Control the output format when aggregating with multiple functions or performing custom aggregation. "wide" (default) returns a wider data frame with added columns for each additional function. "list" returns a list of data frames - one for each function. "long" adds a column "Function" and row-binds the results from different functions using data.table::rbindlist. "long.dupl" is a special option for aggregating multi-type data using multiple FUN but only one catFUN or vice-versa. In that case the format is long and data aggregated using only one function is duplicated. See Examples.

give.names

logical. Create unique names of aggregated columns by adding a prefix 'FUN.var'. 'auto' will automatically create such prefixes whenever multiple functions are applied to a column.

sort.row

depreciated, renamed to sort.

...

additional arguments passed to all functions supplied to FUN, catFUN, wFUN or custom. The behavior of Fast Statistical Functions is regulated by option("collapse_unused_arg_action") and defaults to "warning".

Details

collap automatically checks each function passed to it whether it is a Fast Statistical Function (i.e. whether the function name is contained in .FAST_STAT_FUN). If the function is a fast statistical function, collap only does the grouping and then calls the function to carry out the grouped computations. If the function is not one of .FAST_STAT_FUN, BY is called internally to perform the computation. The resulting computations from each function are put into a list and recombined to produce the desired output format as controlled by the return argument.

When setting parallel = TRUE on a non-windows computer, aggregations will efficiently be parallelized at the column level using mclapply utilizing mc.cores cores.

Value

X aggregated. If X is not a data frame it is coerced to one using qDF and then aggregated.

Note

(1) Since BY does not check and split additional arguments passed to it, it is presently not possible to create a weighted function in R and apply it to data by groups with collap. Weighted aggregations only work with Fast Statistical Functions supporting weights. User written weighted functions can be applied using the data.table package.

(2) When the w argument is used, the weights are passed to all Fast Statistical Functions. This may be undesirable in settings like collapse::collap(data, ~ id, custom = list(fsum = ..., fmean = ...), w = ~ weights) where some columns are to be aggregated using the weighted mean, and others using a simple sum or another unweighted statistic. Since many Fast Statistical Functions including fsum support weights, the above computes a weighted mean and a weighted sum. A couple of workarounds were outlined here, but collapse 1.5.0 incorporates an easy solution into collap: It is now possible to simply append Fast Statistical Functions by _uw to yield an unweighted computation. So for the above example we can write: collapse::collap(data, ~ id, custom = list(fsum_uw = ..., fmean = ...), w = ~ weights) to get the weighted mean and the simple sum. Note that the _uw functions are not available for use outside collap. Thus one also needs to quote them when passed to the FUN or catFUN arguments, e.g. use collap(data, ~ id, fmean, "fmode_uw", w = ~ weighs), since collap(data, ~ id, fmean, fmode_uw, w = ~ weighs) gives an error stating that fmode_uw was not found. Note also that it is never necessary for functions passed to wFUN to be appended like this, as the weights are never used to aggregate themselves.

(3) The dispatch between using optimized Fast Statistical Functions performing grouped computations internally or calling BY to perform split-apply-combine computing is done by matching the function name against .FAST_STAT_FUN. Thus code like collapse::collap(data, ~ id, collapse::fmedian) does not yield an optimized computation, as "collapse::fmedian" %!in% .FAST_STAT_FUN. It is sufficient to write collapse::collap(data, ~ id, "fmedian") to get the desired result when the collapse namespace is not attached.

See also

Examples

## A Simple Introduction -------------------------------------- head(iris)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3.0 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa #> 4 4.6 3.1 1.5 0.2 setosa #> 5 5.0 3.6 1.4 0.2 setosa #> 6 5.4 3.9 1.7 0.4 setosa
collap(iris, ~ Species) # Default: FUN = fmean for numeric
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.006 3.428 1.462 0.246 setosa #> 2 5.936 2.770 4.260 1.326 versicolor #> 3 6.588 2.974 5.552 2.026 virginica
collapv(iris, 5) # Same using collapv
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.006 3.428 1.462 0.246 setosa #> 2 5.936 2.770 4.260 1.326 versicolor #> 3 6.588 2.974 5.552 2.026 virginica
collap(iris, ~ Species, fmedian) # Using the median
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.0 3.4 1.50 0.2 setosa #> 2 5.9 2.8 4.35 1.3 versicolor #> 3 6.5 3.0 5.55 2.0 virginica
collap(iris, ~ Species, fmedian, keep.col.order = FALSE) # Groups in-front
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width #> 1 setosa 5.0 3.4 1.50 0.2 #> 2 versicolor 5.9 2.8 4.35 1.3 #> 3 virginica 6.5 3.0 5.55 2.0
collap(iris, Sepal.Width + Petal.Width ~ Species, fmedian) # Only '.Width' columns
#> Sepal.Width Petal.Width Species #> 1 3.4 0.2 setosa #> 2 2.8 1.3 versicolor #> 3 3.0 2.0 virginica
collapv(iris, 5, cols = c(2, 4)) # Same using collapv
#> Sepal.Width Petal.Width Species #> 1 3.428 0.246 setosa #> 2 2.770 1.326 versicolor #> 3 2.974 2.026 virginica
collap(iris, ~ Species, list(fmean, fmedian)) # Two functions
#> fmean.Sepal.Length fmedian.Sepal.Length fmean.Sepal.Width fmedian.Sepal.Width #> 1 5.006 5.0 3.428 3.4 #> 2 5.936 5.9 2.770 2.8 #> 3 6.588 6.5 2.974 3.0 #> fmean.Petal.Length fmedian.Petal.Length fmean.Petal.Width fmedian.Petal.Width #> 1 1.462 1.50 0.246 0.2 #> 2 4.260 4.35 1.326 1.3 #> 3 5.552 5.55 2.026 2.0 #> Species #> 1 setosa #> 2 versicolor #> 3 virginica
collap(iris, ~ Species, list(fmean, fmedian), return = "long") # Long format
#> Function Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 fmean 5.006 3.428 1.462 0.246 setosa #> 2 fmean 5.936 2.770 4.260 1.326 versicolor #> 3 fmean 6.588 2.974 5.552 2.026 virginica #> 4 fmedian 5.000 3.400 1.500 0.200 setosa #> 5 fmedian 5.900 2.800 4.350 1.300 versicolor #> 6 fmedian 6.500 3.000 5.550 2.000 virginica
collapv(iris, 5, custom = list(fmean = 1:2, fmedian = 3:4)) # Custom aggregation
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.006 3.428 1.50 0.2 setosa #> 2 5.936 2.770 4.35 1.3 versicolor #> 3 6.588 2.974 5.55 2.0 virginica
collapv(iris, 5, custom = list(fmean = 1:2, fmedian = 3:4), # Raw output, no column reordering return = "list")
#> [[1]] #> Species Sepal.Length Sepal.Width #> 1 setosa 5.006 3.428 #> 2 versicolor 5.936 2.770 #> 3 virginica 6.588 2.974 #> #> [[2]] #> Species Petal.Length Petal.Width #> 1 setosa 1.50 0.2 #> 2 versicolor 4.35 1.3 #> 3 virginica 5.55 2.0 #>
collapv(iris, 5, custom = list(fmean = 1:2, fmedian = 3:4), # A strange choice.. return = "long")
#> Function Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 1 5.006 3.428 NA NA setosa #> 2 1 5.936 2.770 NA NA versicolor #> 3 1 6.588 2.974 NA NA virginica #> 4 2 NA NA 1.50 0.2 setosa #> 5 2 NA NA 4.35 1.3 versicolor #> 6 2 NA NA 5.55 2.0 virginica
collap(iris, ~ Species, w = ~ Sepal.Length) # Using Sepal.Length as weights, ..
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 250.3 3.447423 1.465202 0.2480224 setosa #> 2 296.8 2.784063 4.290195 1.3352089 versicolor #> 3 329.4 2.987948 5.597116 2.0333030 virginica
weights <- abs(rnorm(fnrow(iris))) collap(iris, ~ Species, w = weights) # Some random weights..
#> weights Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 39.05823 4.938751 3.360079 1.439973 0.2210811 setosa #> 2 39.11259 5.918390 2.769699 4.236550 1.3255695 versicolor #> 3 39.60369 6.590353 3.012308 5.572586 2.0047445 virginica
collap(iris, iris$Species, w = weights) # Note this behavior..
#> Species weights Sepal.Length Sepal.Width Petal.Length Petal.Width #> 1 setosa 39.05823 4.938751 3.360079 1.439973 0.2210811 #> 2 versicolor 39.11259 5.918390 2.769699 4.236550 1.3255695 #> 3 virginica 39.60369 6.590353 3.012308 5.572586 2.0047445 #> Species #> 1 setosa #> 2 versicolor #> 3 virginica
collap(iris, iris$Species, w = weights, keep.by = FALSE, keep.w = FALSE)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 4.938751 3.360079 1.439973 0.2210811 setosa #> 2 5.918390 2.769699 4.236550 1.3255695 versicolor #> 3 6.590353 3.012308 5.572586 2.0047445 virginica
library(dplyr) # Needed for "%>%" iris %>% fgroup_by(Species) %>% collapg # dplyr style, but faster
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width #> 1 setosa 5.006 3.428 1.462 0.246 #> 2 versicolor 5.936 2.770 4.260 1.326 #> 3 virginica 6.588 2.974 5.552 2.026
## Multi-Type Aggregation -------------------------------------- head(wlddev) # World Development Panel Data
#> country iso3c date year decade region income OECD PCGDP #> 1 Afghanistan AFG 1961-01-01 1960 1960 South Asia Low income FALSE NA #> 2 Afghanistan AFG 1962-01-01 1961 1960 South Asia Low income FALSE NA #> 3 Afghanistan AFG 1963-01-01 1962 1960 South Asia Low income FALSE NA #> 4 Afghanistan AFG 1964-01-01 1963 1960 South Asia Low income FALSE NA #> 5 Afghanistan AFG 1965-01-01 1964 1960 South Asia Low income FALSE NA #> 6 Afghanistan AFG 1966-01-01 1965 1960 South Asia Low income FALSE NA #> LIFEEX GINI ODA #> 1 32.292 NA 114440000 #> 2 32.742 NA 233350000 #> 3 33.185 NA 114880000 #> 4 33.624 NA 236450000 #> 5 34.060 NA 302480000 #> 6 34.495 NA 370250000
head(collap(wlddev, ~ country + decade)) # Aggregate by country and decade
#> country iso3c date year decade region income OECD #> 1 Afghanistan AFG 1961-01-01 1962.5 1960 South Asia Low income FALSE #> 2 Afghanistan AFG 1967-01-01 1970.0 1970 South Asia Low income FALSE #> 3 Afghanistan AFG 1976-01-01 1980.0 1980 South Asia Low income FALSE #> 4 Afghanistan AFG 1987-01-01 1990.0 1990 South Asia Low income FALSE #> 5 Afghanistan AFG 1996-01-01 2000.0 2000 South Asia Low income FALSE #> 6 Afghanistan AFG 2007-01-01 2010.0 2010 South Asia Low income FALSE #> PCGDP LIFEEX GINI ODA #> 1 NA 33.39967 NA 228641667 #> 2 NA 36.70089 NA 224881111 #> 3 NA 42.03909 NA 132813636 #> 4 NA 49.69089 NA 249072222 #> 5 349.7596 55.61818 NA 1045170000 #> 6 506.9931 61.12978 NA 5298066667
head(collap(wlddev, ~ country + decade, fmedian, ffirst)) # Different functions
#> country iso3c date year decade region income OECD #> 1 Afghanistan AFG 1961-01-01 1962.5 1960 South Asia Low income FALSE #> 2 Afghanistan AFG 1967-01-01 1970.0 1970 South Asia Low income FALSE #> 3 Afghanistan AFG 1976-01-01 1980.0 1980 South Asia Low income FALSE #> 4 Afghanistan AFG 1987-01-01 1990.0 1990 South Asia Low income FALSE #> 5 Afghanistan AFG 1996-01-01 2000.0 2000 South Asia Low income FALSE #> 6 Afghanistan AFG 2007-01-01 2010.0 2010 South Asia Low income FALSE #> PCGDP LIFEEX GINI ODA #> 1 NA 33.4045 NA 234900000 #> 2 NA 36.6780 NA 229150000 #> 3 NA 41.8530 NA 72610000 #> 4 NA 49.8560 NA 238500000 #> 5 346.9282 55.4820 NA 306200000 #> 6 536.0125 61.2260 NA 5033950000
head(collap(wlddev, ~ country + decade, cols = is.numeric)) # Aggregate only numeric columns
#> country year decade decade PCGDP LIFEEX GINI ODA #> 1 Afghanistan 1962.5 1960 1960 NA 33.39967 NA 228641667 #> 2 Afghanistan 1970.0 1970 1970 NA 36.70089 NA 224881111 #> 3 Afghanistan 1980.0 1980 1980 NA 42.03909 NA 132813636 #> 4 Afghanistan 1990.0 1990 1990 NA 49.69089 NA 249072222 #> 5 Afghanistan 2000.0 2000 2000 349.7596 55.61818 NA 1045170000 #> 6 Afghanistan 2010.0 2010 2010 506.9931 61.12978 NA 5298066667
head(collap(wlddev, ~ country + decade, cols = 9:12)) # Only the 4 series
#> country decade PCGDP LIFEEX GINI ODA #> 1 Afghanistan 1960 NA 33.39967 NA 228641667 #> 2 Afghanistan 1970 NA 36.70089 NA 224881111 #> 3 Afghanistan 1980 NA 42.03909 NA 132813636 #> 4 Afghanistan 1990 NA 49.69089 NA 249072222 #> 5 Afghanistan 2000 349.7596 55.61818 NA 1045170000 #> 6 Afghanistan 2010 506.9931 61.12978 NA 5298066667
head(collap(wlddev, PCGDP + LIFEEX ~ country + decade)) # Only GDP and life-expactancy
#> country decade PCGDP LIFEEX #> 1 Afghanistan 1960 NA 33.39967 #> 2 Afghanistan 1970 NA 36.70089 #> 3 Afghanistan 1980 NA 42.03909 #> 4 Afghanistan 1990 NA 49.69089 #> 5 Afghanistan 2000 349.7596 55.61818 #> 6 Afghanistan 2010 506.9931 61.12978
head(collap(wlddev, PCGDP + LIFEEX ~ country + decade, fsum)) # Using the sum instead
#> country decade PCGDP LIFEEX #> 1 Afghanistan 1960 NA 200.398 #> 2 Afghanistan 1970 NA 330.308 #> 3 Afghanistan 1980 NA 462.430 #> 4 Afghanistan 1990 NA 447.218 #> 5 Afghanistan 2000 1399.038 611.800 #> 6 Afghanistan 2010 4562.938 550.168
head(collap(wlddev, PCGDP + LIFEEX ~ country + decade, sum, # Same using base::sum -> slower! na.rm = TRUE))
#> country decade PCGDP LIFEEX #> 1 Afghanistan 1960 0.000 200.398 #> 2 Afghanistan 1970 0.000 330.308 #> 3 Afghanistan 1980 0.000 462.430 #> 4 Afghanistan 1990 0.000 447.218 #> 5 Afghanistan 2000 1399.038 611.800 #> 6 Afghanistan 2010 4562.938 550.168
head(collap(wlddev, wlddev[c("country","decade")], fsum, # Same, exploring different inputs cols = 9:10))
#> country decade PCGDP LIFEEX #> 1 Afghanistan 1960 NA 200.398 #> 2 Afghanistan 1970 NA 330.308 #> 3 Afghanistan 1980 NA 462.430 #> 4 Afghanistan 1990 NA 447.218 #> 5 Afghanistan 2000 1399.038 611.800 #> 6 Afghanistan 2010 4562.938 550.168
head(collap(wlddev[9:10], wlddev[c("country","decade")], fsum))
#> country decade PCGDP LIFEEX #> 1 Afghanistan 1960 NA 200.398 #> 2 Afghanistan 1970 NA 330.308 #> 3 Afghanistan 1980 NA 462.430 #> 4 Afghanistan 1990 NA 447.218 #> 5 Afghanistan 2000 1399.038 611.800 #> 6 Afghanistan 2010 4562.938 550.168
head(collapv(wlddev, c("country","decade"), fsum)) # ..names/indices with collapv
#> country iso3c date year decade region income OECD #> 1 Afghanistan AFG 1961-01-01 11775 1960 South Asia Low income FALSE #> 2 Afghanistan AFG 1967-01-01 17730 1970 South Asia Low income FALSE #> 3 Afghanistan AFG 1976-01-01 21780 1980 South Asia Low income FALSE #> 4 Afghanistan AFG 1987-01-01 17910 1990 South Asia Low income FALSE #> 5 Afghanistan AFG 1996-01-01 22000 2000 South Asia Low income FALSE #> 6 Afghanistan AFG 2007-01-01 18090 2010 South Asia Low income FALSE #> PCGDP LIFEEX GINI ODA #> 1 NA 200.398 NA 1371850000 #> 2 NA 330.308 NA 2023930000 #> 3 NA 462.430 NA 1460950000 #> 4 NA 447.218 NA 2241650000 #> 5 1399.038 611.800 NA 11496870000 #> 6 4562.938 550.168 NA 47682600000
head(collapv(wlddev, c(1,5), fsum))
#> country iso3c date year decade region income OECD #> 1 Afghanistan AFG 1961-01-01 11775 1960 South Asia Low income FALSE #> 2 Afghanistan AFG 1967-01-01 17730 1970 South Asia Low income FALSE #> 3 Afghanistan AFG 1976-01-01 21780 1980 South Asia Low income FALSE #> 4 Afghanistan AFG 1987-01-01 17910 1990 South Asia Low income FALSE #> 5 Afghanistan AFG 1996-01-01 22000 2000 South Asia Low income FALSE #> 6 Afghanistan AFG 2007-01-01 18090 2010 South Asia Low income FALSE #> PCGDP LIFEEX GINI ODA #> 1 NA 200.398 NA 1371850000 #> 2 NA 330.308 NA 2023930000 #> 3 NA 462.430 NA 1460950000 #> 4 NA 447.218 NA 2241650000 #> 5 1399.038 611.800 NA 11496870000 #> 6 4562.938 550.168 NA 47682600000
g <- GRP(wlddev, ~ country + decade) # Precomputing the grouping head(collap(wlddev, g, keep.by = FALSE)) # This is slightly faster now
#> country iso3c date year decade region income OECD #> 1 Afghanistan AFG 1961-01-01 1962.5 1960 South Asia Low income FALSE #> 2 Afghanistan AFG 1967-01-01 1970.0 1970 South Asia Low income FALSE #> 3 Afghanistan AFG 1976-01-01 1980.0 1980 South Asia Low income FALSE #> 4 Afghanistan AFG 1987-01-01 1990.0 1990 South Asia Low income FALSE #> 5 Afghanistan AFG 1996-01-01 2000.0 2000 South Asia Low income FALSE #> 6 Afghanistan AFG 2007-01-01 2010.0 2010 South Asia Low income FALSE #> PCGDP LIFEEX GINI ODA #> 1 NA 33.39967 NA 228641667 #> 2 NA 36.70089 NA 224881111 #> 3 NA 42.03909 NA 132813636 #> 4 NA 49.69089 NA 249072222 #> 5 349.7596 55.61818 NA 1045170000 #> 6 506.9931 61.12978 NA 5298066667
# Aggregate categorical data using not the mode but the last element head(collap(wlddev, ~ country + decade, fmean, flast))
#> country iso3c date year decade region income OECD #> 1 Afghanistan AFG 1966-01-01 1962.5 1960 South Asia Low income FALSE #> 2 Afghanistan AFG 1975-01-01 1970.0 1970 South Asia Low income FALSE #> 3 Afghanistan AFG 1986-01-01 1980.0 1980 South Asia Low income FALSE #> 4 Afghanistan AFG 1995-01-01 1990.0 1990 South Asia Low income FALSE #> 5 Afghanistan AFG 2006-01-01 2000.0 2000 South Asia Low income FALSE #> 6 Afghanistan AFG 2015-01-01 2010.0 2010 South Asia Low income FALSE #> PCGDP LIFEEX GINI ODA #> 1 NA 33.39967 NA 228641667 #> 2 NA 36.70089 NA 224881111 #> 3 NA 42.03909 NA 132813636 #> 4 NA 49.69089 NA 249072222 #> 5 349.7596 55.61818 NA 1045170000 #> 6 506.9931 61.12978 NA 5298066667
head(collap(wlddev, ~ country + decade, catFUN = flast, # Aggregate only categorical data cols = is.categorical))
#> country country iso3c date decade region income OECD #> 1 Afghanistan Afghanistan AFG 1966-01-01 1960 South Asia Low income FALSE #> 2 Afghanistan Afghanistan AFG 1975-01-01 1970 South Asia Low income FALSE #> 3 Afghanistan Afghanistan AFG 1986-01-01 1980 South Asia Low income FALSE #> 4 Afghanistan Afghanistan AFG 1995-01-01 1990 South Asia Low income FALSE #> 5 Afghanistan Afghanistan AFG 2006-01-01 2000 South Asia Low income FALSE #> 6 Afghanistan Afghanistan AFG 2015-01-01 2010 South Asia Low income FALSE
## Weighted Aggregation ---------------------------------------- weights <- abs(rnorm(fnrow(wlddev))) # Random weight vector head(collap(wlddev, ~ country + decade, w = weights)) # Takes weighted mean for numeric..
#> weights country iso3c date year decade region income #> 1 5.219775 Afghanistan AFG 1964-01-01 1962.878 1960 South Asia Low income #> 2 6.280377 Afghanistan AFG 1972-01-01 1970.986 1970 South Asia Low income #> 3 12.357487 Afghanistan AFG 1979-01-01 1980.104 1980 South Asia Low income #> 4 9.588870 Afghanistan AFG 1987-01-01 1989.056 1990 South Asia Low income #> 5 9.112089 Afghanistan AFG 2004-01-01 2000.208 2000 South Asia Low income #> 6 9.513924 Afghanistan AFG 2009-01-01 2008.957 2010 South Asia Low income #> OECD PCGDP LIFEEX GINI ODA #> 1 FALSE NA 33.56576 NA 247280527 #> 2 FALSE NA 37.14266 NA 221160015 #> 3 FALSE NA 42.06259 NA 123263137 #> 4 FALSE NA 48.93413 NA 199747378 #> 5 FALSE 354.9449 55.73774 NA 1108491908 #> 6 FALSE 471.7897 60.65860 NA 5290040073
# ..and weighted mode for categorical data. The weight vector is aggregated using fsum wlddev$weights <- weights # Adding to data head(collap(wlddev, ~ country + decade, w = ~ weights)) # Keeps column order
#> country iso3c date year decade region income OECD #> 1 Afghanistan AFG 1964-01-01 1962.878 1960 South Asia Low income FALSE #> 2 Afghanistan AFG 1972-01-01 1970.986 1970 South Asia Low income FALSE #> 3 Afghanistan AFG 1979-01-01 1980.104 1980 South Asia Low income FALSE #> 4 Afghanistan AFG 1987-01-01 1989.056 1990 South Asia Low income FALSE #> 5 Afghanistan AFG 2004-01-01 2000.208 2000 South Asia Low income FALSE #> 6 Afghanistan AFG 2009-01-01 2008.957 2010 South Asia Low income FALSE #> PCGDP LIFEEX GINI ODA weights #> 1 NA 33.56576 NA 247280527 5.219775 #> 2 NA 37.14266 NA 221160015 6.280377 #> 3 NA 42.06259 NA 123263137 12.357487 #> 4 NA 48.93413 NA 199747378 9.588870 #> 5 354.9449 55.73774 NA 1108491908 9.112089 #> 6 471.7897 60.65860 NA 5290040073 9.513924
head(collap(wlddev, ~ country + decade, w = ~ weights, # Aggregating weights using sum wFUN = list(fsum, fmax))) # and max (corresponding to mode)
#> country iso3c date year decade region income OECD #> 1 Afghanistan AFG 1964-01-01 1962.878 1960 South Asia Low income FALSE #> 2 Afghanistan AFG 1972-01-01 1970.986 1970 South Asia Low income FALSE #> 3 Afghanistan AFG 1979-01-01 1980.104 1980 South Asia Low income FALSE #> 4 Afghanistan AFG 1987-01-01 1989.056 1990 South Asia Low income FALSE #> 5 Afghanistan AFG 2004-01-01 2000.208 2000 South Asia Low income FALSE #> 6 Afghanistan AFG 2009-01-01 2008.957 2010 South Asia Low income FALSE #> PCGDP LIFEEX GINI ODA fsum.weights fmax.weights #> 1 NA 33.56576 NA 247280527 5.219775 1.790592 #> 2 NA 37.14266 NA 221160015 6.280377 1.270672 #> 3 NA 42.06259 NA 123263137 12.357487 2.211769 #> 4 NA 48.93413 NA 199747378 9.588870 2.564408 #> 5 354.9449 55.73774 NA 1108491908 9.112089 2.446680 #> 6 471.7897 60.65860 NA 5290040073 9.513924 2.648932
wlddev$weights <- NULL ## Multi-Function Aggregation ---------------------------------- head(collap(wlddev, ~ country + decade, list(fmean, fNobs), # Saving mean and Nobs cols = 9:12))
#> country decade fmean.PCGDP fNobs.PCGDP fmean.LIFEEX fNobs.LIFEEX #> 1 Afghanistan 1960 NA 0 33.39967 6 #> 2 Afghanistan 1970 NA 0 36.70089 9 #> 3 Afghanistan 1980 NA 0 42.03909 11 #> 4 Afghanistan 1990 NA 0 49.69089 9 #> 5 Afghanistan 2000 349.7596 4 55.61818 11 #> 6 Afghanistan 2010 506.9931 9 61.12978 9 #> fmean.GINI fNobs.GINI fmean.ODA fNobs.ODA #> 1 NA 0 228641667 6 #> 2 NA 0 224881111 9 #> 3 NA 0 132813636 11 #> 4 NA 0 249072222 9 #> 5 NA 0 1045170000 11 #> 6 NA 0 5298066667 9
head(collap(wlddev, ~ country + decade, # Same using base R -> slower list(mean = mean, Nobs = function(x, ...) sum(!is.na(x))), cols = 9:12, na.rm = TRUE))
#> country decade mean.PCGDP Nobs.PCGDP mean.LIFEEX Nobs.LIFEEX mean.GINI #> 1 Afghanistan 1960 NaN 0 33.39967 6 NaN #> 2 Afghanistan 1970 NaN 0 36.70089 9 NaN #> 3 Afghanistan 1980 NaN 0 42.03909 11 NaN #> 4 Afghanistan 1990 NaN 0 49.69089 9 NaN #> 5 Afghanistan 2000 349.7596 4 55.61818 11 NaN #> 6 Afghanistan 2010 506.9931 9 61.12978 9 NaN #> Nobs.GINI mean.ODA Nobs.ODA #> 1 0 228641667 6 #> 2 0 224881111 9 #> 3 0 132813636 11 #> 4 0 249072222 9 #> 5 0 1045170000 11 #> 6 0 5298066667 9
lapply(collap(wlddev, ~ country + decade, # List output format list(fmean, fNobs), cols = 9:12, return = "list"), head)
#> $fmean #> country decade PCGDP LIFEEX GINI ODA #> 1 Afghanistan 1960 NA 33.39967 NA 228641667 #> 2 Afghanistan 1970 NA 36.70089 NA 224881111 #> 3 Afghanistan 1980 NA 42.03909 NA 132813636 #> 4 Afghanistan 1990 NA 49.69089 NA 249072222 #> 5 Afghanistan 2000 349.7596 55.61818 NA 1045170000 #> 6 Afghanistan 2010 506.9931 61.12978 NA 5298066667 #> #> $fNobs #> country decade PCGDP LIFEEX GINI ODA #> 1 Afghanistan 1960 0 6 0 6 #> 2 Afghanistan 1970 0 9 0 9 #> 3 Afghanistan 1980 0 11 0 11 #> 4 Afghanistan 1990 0 9 0 9 #> 5 Afghanistan 2000 4 11 0 11 #> 6 Afghanistan 2010 9 9 0 9 #>
head(collap(wlddev, ~ country + decade, # Long output format list(fmean, fNobs), cols = 9:12, return = "long"))
#> Function country decade PCGDP LIFEEX GINI ODA #> 1 fmean Afghanistan 1960 NA 33.39967 NA 228641667 #> 2 fmean Afghanistan 1970 NA 36.70089 NA 224881111 #> 3 fmean Afghanistan 1980 NA 42.03909 NA 132813636 #> 4 fmean Afghanistan 1990 NA 49.69089 NA 249072222 #> 5 fmean Afghanistan 2000 349.7596 55.61818 NA 1045170000 #> 6 fmean Afghanistan 2010 506.9931 61.12978 NA 5298066667
head(collap(wlddev, ~ country + decade, # Also aggregating categorical data, list(fmean, fNobs), return = "long_dupl")) # and duplicating it 2 times
#> Function country iso3c date year decade region income #> 1 fmean Afghanistan AFG 1961-01-01 1962.5 1960 South Asia Low income #> 2 fmean Afghanistan AFG 1967-01-01 1970.0 1970 South Asia Low income #> 3 fmean Afghanistan AFG 1976-01-01 1980.0 1980 South Asia Low income #> 4 fmean Afghanistan AFG 1987-01-01 1990.0 1990 South Asia Low income #> 5 fmean Afghanistan AFG 1996-01-01 2000.0 2000 South Asia Low income #> 6 fmean Afghanistan AFG 2007-01-01 2010.0 2010 South Asia Low income #> OECD PCGDP LIFEEX GINI ODA #> 1 FALSE NA 33.39967 NA 228641667 #> 2 FALSE NA 36.70089 NA 224881111 #> 3 FALSE NA 42.03909 NA 132813636 #> 4 FALSE NA 49.69089 NA 249072222 #> 5 FALSE 349.7596 55.61818 NA 1045170000 #> 6 FALSE 506.9931 61.12978 NA 5298066667
head(collap(wlddev, ~ country + decade, # Now also using 2 functions on list(fmean, fNobs), list(fmode, flast), # categorical data keep.col.order = FALSE))
#> country decade fmean.year fmean.PCGDP fmean.LIFEEX fmean.GINI fmean.ODA #> 1 Afghanistan 1960 1962.5 NA 33.39967 NA 228641667 #> 2 Afghanistan 1970 1970.0 NA 36.70089 NA 224881111 #> 3 Afghanistan 1980 1980.0 NA 42.03909 NA 132813636 #> 4 Afghanistan 1990 1990.0 NA 49.69089 NA 249072222 #> 5 Afghanistan 2000 2000.0 349.7596 55.61818 NA 1045170000 #> 6 Afghanistan 2010 2010.0 506.9931 61.12978 NA 5298066667 #> fNobs.year fNobs.PCGDP fNobs.LIFEEX fNobs.GINI fNobs.ODA fmode.iso3c #> 1 6 0 6 0 6 AFG #> 2 9 0 9 0 9 AFG #> 3 11 0 11 0 11 AFG #> 4 9 0 9 0 9 AFG #> 5 11 4 11 0 11 AFG #> 6 9 9 9 0 9 AFG #> fmode.date fmode.region fmode.income fmode.OECD flast.iso3c flast.date #> 1 1961-01-01 South Asia Low income FALSE AFG 1966-01-01 #> 2 1967-01-01 South Asia Low income FALSE AFG 1975-01-01 #> 3 1976-01-01 South Asia Low income FALSE AFG 1986-01-01 #> 4 1987-01-01 South Asia Low income FALSE AFG 1995-01-01 #> 5 1996-01-01 South Asia Low income FALSE AFG 2006-01-01 #> 6 2007-01-01 South Asia Low income FALSE AFG 2015-01-01 #> flast.region flast.income flast.OECD #> 1 South Asia Low income FALSE #> 2 South Asia Low income FALSE #> 3 South Asia Low income FALSE #> 4 South Asia Low income FALSE #> 5 South Asia Low income FALSE #> 6 South Asia Low income FALSE
head(collap(wlddev, ~ country + decade, # More functions, string input, c("fmean","fsum","fNobs","fsd","fvar"), # parallelized execution c("fmode","ffirst","flast","fNdistinct"), # (choose more than 1 cores, parallel = TRUE, mc.cores = 1L, # depending on your machine) keep.col.order = FALSE))
#> country decade fmean.year fmean.PCGDP fmean.LIFEEX fmean.GINI fmean.ODA #> 1 Afghanistan 1960 1962.5 NA 33.39967 NA 228641667 #> 2 Afghanistan 1970 1970.0 NA 36.70089 NA 224881111 #> 3 Afghanistan 1980 1980.0 NA 42.03909 NA 132813636 #> 4 Afghanistan 1990 1990.0 NA 49.69089 NA 249072222 #> 5 Afghanistan 2000 2000.0 349.7596 55.61818 NA 1045170000 #> 6 Afghanistan 2010 2010.0 506.9931 61.12978 NA 5298066667 #> fsum.year fsum.PCGDP fsum.LIFEEX fsum.GINI fsum.ODA fNobs.year fNobs.PCGDP #> 1 11775 NA 200.398 NA 1371850000 6 0 #> 2 17730 NA 330.308 NA 2023930000 9 0 #> 3 21780 NA 462.430 NA 1460950000 11 0 #> 4 17910 NA 447.218 NA 2241650000 9 0 #> 5 22000 1399.038 611.800 NA 11496870000 11 4 #> 6 18090 4562.938 550.168 NA 47682600000 9 9 #> fNobs.LIFEEX fNobs.GINI fNobs.ODA fsd.year fsd.PCGDP fsd.LIFEEX fsd.GINI #> 1 6 0 6 1.870829 NA 0.8236083 NA #> 2 9 0 9 2.738613 NA 1.2329864 NA #> 3 11 0 11 3.316625 NA 2.1599808 NA #> 4 9 0 9 2.738613 NA 2.1258392 NA #> 5 11 0 11 3.316625 11.89378 1.8056120 NA #> 6 9 0 9 2.738613 86.22723 1.2872844 NA #> fsd.ODA fvar.year fvar.PCGDP fvar.LIFEEX fvar.GINI fvar.ODA #> 1 101559732 3.5 NA 0.6783307 NA 1.031438e+16 #> 2 58515953 7.5 NA 1.5202554 NA 3.424117e+15 #> 3 108271755 11.0 NA 4.6655169 NA 1.172277e+16 #> 4 203018929 7.5 NA 4.5191921 NA 4.121669e+16 #> 5 1113781886 11.0 141.462 3.2602346 NA 1.240510e+18 #> 6 1102034385 7.5 7435.136 1.6571012 NA 1.214480e+18 #> fmode.iso3c fmode.date fmode.region fmode.income fmode.OECD ffirst.iso3c #> 1 AFG 1961-01-01 South Asia Low income FALSE AFG #> 2 AFG 1967-01-01 South Asia Low income FALSE AFG #> 3 AFG 1976-01-01 South Asia Low income FALSE AFG #> 4 AFG 1987-01-01 South Asia Low income FALSE AFG #> 5 AFG 1996-01-01 South Asia Low income FALSE AFG #> 6 AFG 2007-01-01 South Asia Low income FALSE AFG #> ffirst.date ffirst.region ffirst.income ffirst.OECD flast.iso3c flast.date #> 1 1961-01-01 South Asia Low income FALSE AFG 1966-01-01 #> 2 1967-01-01 South Asia Low income FALSE AFG 1975-01-01 #> 3 1976-01-01 South Asia Low income FALSE AFG 1986-01-01 #> 4 1987-01-01 South Asia Low income FALSE AFG 1995-01-01 #> 5 1996-01-01 South Asia Low income FALSE AFG 2006-01-01 #> 6 2007-01-01 South Asia Low income FALSE AFG 2015-01-01 #> flast.region flast.income flast.OECD fNdistinct.iso3c fNdistinct.date #> 1 South Asia Low income FALSE 1 6 #> 2 South Asia Low income FALSE 1 9 #> 3 South Asia Low income FALSE 1 11 #> 4 South Asia Low income FALSE 1 9 #> 5 South Asia Low income FALSE 1 11 #> 6 South Asia Low income FALSE 1 9 #> fNdistinct.region fNdistinct.income fNdistinct.OECD #> 1 1 1 1 #> 2 1 1 1 #> 3 1 1 1 #> 4 1 1 1 #> 5 1 1 1 #> 6 1 1 1
## Custom Aggregation ------------------------------------------ head(collap(wlddev, ~ country + decade, # Custom aggregation custom = list(fmean = 9:12, fsd = 9:10, fmode = 7:8)))
#> country decade fmode.income fmode.OECD fmean.PCGDP fsd.PCGDP fmean.LIFEEX #> 1 Afghanistan 1960 Low income FALSE NA NA 33.39967 #> 2 Afghanistan 1970 Low income FALSE NA NA 36.70089 #> 3 Afghanistan 1980 Low income FALSE NA NA 42.03909 #> 4 Afghanistan 1990 Low income FALSE NA NA 49.69089 #> 5 Afghanistan 2000 Low income FALSE 349.7596 11.89378 55.61818 #> 6 Afghanistan 2010 Low income FALSE 506.9931 86.22723 61.12978 #> fsd.LIFEEX fmean.GINI fmean.ODA #> 1 0.8236083 NA 228641667 #> 2 1.2329864 NA 224881111 #> 3 2.1599808 NA 132813636 #> 4 2.1258392 NA 249072222 #> 5 1.8056120 NA 1045170000 #> 6 1.2872844 NA 5298066667
head(collap(wlddev, ~ country + decade, # Using column names custom = list(fmean = "PCGDP", fsd = c("LIFEEX","GINI"), flast = "date")))
#> country date decade PCGDP LIFEEX GINI #> 1 Afghanistan 1966-01-01 1960 NA 0.8236083 NA #> 2 Afghanistan 1975-01-01 1970 NA 1.2329864 NA #> 3 Afghanistan 1986-01-01 1980 NA 2.1599808 NA #> 4 Afghanistan 1995-01-01 1990 NA 2.1258392 NA #> 5 Afghanistan 2006-01-01 2000 349.7596 1.8056120 NA #> 6 Afghanistan 2015-01-01 2010 506.9931 1.2872844 NA
head(collap(wlddev, ~ country + decade, # Weighted parallelized custom custom = list(fmean = 9:12, fsd = 9:10, # aggregation fmode = 7:8), w = weights, wFUN = list(fsum, fmax), parallel = TRUE, mc.cores = 1L))
#> fsum.weights fmax.weights country decade fmode.income fmode.OECD #> 1 5.219775 1.790592 Afghanistan 1960 Low income FALSE #> 2 6.280377 1.270672 Afghanistan 1970 Low income FALSE #> 3 12.357487 2.211769 Afghanistan 1980 Low income FALSE #> 4 9.588870 2.564408 Afghanistan 1990 Low income FALSE #> 5 9.112089 2.446680 Afghanistan 2000 Low income FALSE #> 6 9.513924 2.648932 Afghanistan 2010 Low income FALSE #> fmean.PCGDP fsd.PCGDP fmean.LIFEEX fsd.LIFEEX fmean.GINI fmean.ODA #> 1 NA NA 33.56576 0.8351381 NA 247280527 #> 2 NA NA 37.14266 1.1330649 NA 221160015 #> 3 NA NA 42.06259 1.8611270 NA 123263137 #> 4 NA NA 48.93413 2.2581963 NA 199747378 #> 5 354.9449 8.672255 55.73774 1.8527170 NA 1108491908 #> 6 471.7897 78.761274 60.65860 1.0621405 NA 5290040073
head(collap(wlddev, ~ country + decade, # No column reordering custom = list(fmean = 9:12, fsd = 9:10, fmode = 7:8), w = weights, wFUN = list(fsum, fmax), parallel = TRUE, mc.cores = 1L, keep.col.order = FALSE))
#> country decade fsum.weights fmax.weights fmean.PCGDP fmean.LIFEEX #> 1 Afghanistan 1960 5.219775 1.790592 NA 33.56576 #> 2 Afghanistan 1970 6.280377 1.270672 NA 37.14266 #> 3 Afghanistan 1980 12.357487 2.211769 NA 42.06259 #> 4 Afghanistan 1990 9.588870 2.564408 NA 48.93413 #> 5 Afghanistan 2000 9.112089 2.446680 354.9449 55.73774 #> 6 Afghanistan 2010 9.513924 2.648932 471.7897 60.65860 #> fmean.GINI fmean.ODA fsd.PCGDP fsd.LIFEEX fmode.income fmode.OECD #> 1 NA 247280527 NA 0.8351381 Low income FALSE #> 2 NA 221160015 NA 1.1330649 Low income FALSE #> 3 NA 123263137 NA 1.8611270 Low income FALSE #> 4 NA 199747378 NA 2.2581963 Low income FALSE #> 5 NA 1108491908 8.672255 1.8527170 Low income FALSE #> 6 NA 5290040073 78.761274 1.0621405 Low income FALSE
## Piped Use -------------------------------------------------- wlddev %>% fgroup_by(country, decade) %>% collapg %>% head
#> country decade iso3c date year region income OECD #> 1 Afghanistan 1960 AFG 1961-01-01 1962.5 South Asia Low income FALSE #> 2 Afghanistan 1970 AFG 1967-01-01 1970.0 South Asia Low income FALSE #> 3 Afghanistan 1980 AFG 1976-01-01 1980.0 South Asia Low income FALSE #> 4 Afghanistan 1990 AFG 1987-01-01 1990.0 South Asia Low income FALSE #> 5 Afghanistan 2000 AFG 1996-01-01 2000.0 South Asia Low income FALSE #> 6 Afghanistan 2010 AFG 2007-01-01 2010.0 South Asia Low income FALSE #> PCGDP LIFEEX GINI ODA #> 1 NA 33.39967 NA 228641667 #> 2 NA 36.70089 NA 224881111 #> 3 NA 42.03909 NA 132813636 #> 4 NA 49.69089 NA 249072222 #> 5 349.7596 55.61818 NA 1045170000 #> 6 506.9931 61.12978 NA 5298066667
wlddev %>% fgroup_by(country, decade) %>% collapg(w = ODA) %>% head
#> country decade ODA iso3c date year region #> 1 Afghanistan 1960 1371850000 AFG 1966-01-01 1963.086 South Asia #> 2 Afghanistan 1970 2023930000 AFG 1967-01-01 1969.797 South Asia #> 3 Afghanistan 1980 1460950000 AFG 1978-01-01 1977.816 South Asia #> 4 Afghanistan 1990 2241650000 AFG 1992-01-01 1991.213 South Asia #> 5 Afghanistan 2000 11496870000 AFG 2006-01-01 2002.833 South Asia #> 6 Afghanistan 2010 47682600000 AFG 2012-01-01 2010.168 South Asia #> income OECD PCGDP LIFEEX GINI #> 1 Low income FALSE NA 33.65730 NA #> 2 Low income FALSE NA 36.61192 NA #> 3 Low income FALSE NA 40.63028 NA #> 4 Low income FALSE NA 50.67509 NA #> 5 Low income FALSE 351.3902 57.20476 NA #> 6 Low income FALSE 515.1409 61.22270 NA
wlddev %>% fgroup_by(country, decade) %>% collapg(fmedian, flast) %>% head
#> country decade iso3c date year region income OECD #> 1 Afghanistan 1960 AFG 1966-01-01 1962.5 South Asia Low income FALSE #> 2 Afghanistan 1970 AFG 1975-01-01 1970.0 South Asia Low income FALSE #> 3 Afghanistan 1980 AFG 1986-01-01 1980.0 South Asia Low income FALSE #> 4 Afghanistan 1990 AFG 1995-01-01 1990.0 South Asia Low income FALSE #> 5 Afghanistan 2000 AFG 2006-01-01 2000.0 South Asia Low income FALSE #> 6 Afghanistan 2010 AFG 2015-01-01 2010.0 South Asia Low income FALSE #> PCGDP LIFEEX GINI ODA #> 1 NA 33.4045 NA 234900000 #> 2 NA 36.6780 NA 229150000 #> 3 NA 41.8530 NA 72610000 #> 4 NA 49.8560 NA 238500000 #> 5 346.9282 55.4820 NA 306200000 #> 6 536.0125 61.2260 NA 5033950000
wlddev %>% fgroup_by(country, decade) %>% collapg(custom = list(fmean = 9:12, fmode = 5:7, flast = 3)) %>% head
#> country decade date decade region income PCGDP LIFEEX #> 1 Afghanistan 1960 1966-01-01 1960 South Asia Low income NA 33.39967 #> 2 Afghanistan 1970 1975-01-01 1970 South Asia Low income NA 36.70089 #> 3 Afghanistan 1980 1986-01-01 1980 South Asia Low income NA 42.03909 #> 4 Afghanistan 1990 1995-01-01 1990 South Asia Low income NA 49.69089 #> 5 Afghanistan 2000 2006-01-01 2000 South Asia Low income 349.7596 55.61818 #> 6 Afghanistan 2010 2015-01-01 2010 South Asia Low income 506.9931 61.12978 #> GINI ODA #> 1 NA 228641667 #> 2 NA 224881111 #> 3 NA 132813636 #> 4 NA 249072222 #> 5 NA 1045170000 #> 6 NA 5298066667