dapply.Rd
dapply
efficiently applies functions to columns or rows of matrix-like objects and by default returns an object of the same type and with the same attributes. Alternatively it is possible to return the result in a plain matrix or data.frame. A simple parallelism is also available.
dapply(X, FUN, ..., MARGIN = 2, parallel = FALSE, mc.cores = 1L,
return = c("same", "matrix", "data.frame"), drop = TRUE)
a matrix, data frame or alike object.
a function, can be scalar- or vector-valued.
further arguments to FUN
.
integer. The margin which FUN
will be applied over. Default 2
indicates columns while 1
indicates rows. See also Details.
logical. TRUE
implements simple parallel execution by internally calling mclapply
instead of lapply
.
integer. Argument to mclapply
indicating the number of cores to use for parallel execution. Can use detectCores()
to select all available cores.
an integer or string indicating the type of object to return. The default 1 - "same"
returns the same object type (i.e. class and other attributes are retained, just the names for the dimensions are adjusted). 2 - "matrix"
always returns the output as matrix and 3 - "data.frame"
always returns a data frame.
logical. If the result has only one row or one column, drop = TRUE
will drop dimensions and return a (named) atomic vector.
dapply
is an efficient command to apply functions to rows or columns of data without loosing information (attributes) about the data or changing the classes or format of the data. It is principally an efficient wrapper around lapply
and works as follows:
Save the attributes of X
.
If MARGIN = 2
(columns), convert matrices to plain lists of columns using mctl
and remove all attributes from data frames.
If MARGIN = 1
(rows), convert matrices to plain lists of rows using mrtl
. For data frames remove all attributes, efficiently convert to matrix using do.call(cbind, X)
and also convert to list of rows using mrtl
.
Call lapply
or mclapply
on these plain lists (which is faster than calling lapply
on an object with attributes).
depending on the requested output type, use matrix
, unlist
or do.call(cbind, ...)
to convert the result back to a matrix or list of columns.
modify the relevant attributes accordingly and efficiently attach to the object again (no further checks).
The performance gain from working with plain lists makes dapply
not much slower than calling lapply
itself on a data frame. Because of the conversions involved, row-operations require some memory, but are still faster than apply
.
X
where FUN
was applied to every row or column.
head(dapply(mtcars, log)) # Take natural log of each variable
#> mpg cyl disp hp drat wt
#> Mazda RX4 3.044522 1.791759 5.075174 4.700480 1.360977 0.9631743
#> Mazda RX4 Wag 3.044522 1.791759 5.075174 4.700480 1.360977 1.0560527
#> Datsun 710 3.126761 1.386294 4.682131 4.532599 1.348073 0.8415672
#> Hornet 4 Drive 3.063391 1.791759 5.552960 4.700480 1.124930 1.1678274
#> Hornet Sportabout 2.928524 2.079442 5.886104 5.164786 1.147402 1.2354715
#> Valiant 2.895912 1.791759 5.416100 4.653960 1.015231 1.2412686
#> qsec vs am gear carb
#> Mazda RX4 2.800933 -Inf 0 1.386294 1.3862944
#> Mazda RX4 Wag 2.834389 -Inf 0 1.386294 1.3862944
#> Datsun 710 2.923699 0 0 1.386294 0.0000000
#> Hornet 4 Drive 2.967333 0 -Inf 1.098612 0.0000000
#> Hornet Sportabout 2.834389 -Inf -Inf 1.098612 0.6931472
#> Valiant 3.006672 0 -Inf 1.098612 0.0000000
head(dapply(mtcars, log, return = "matrix")) # Return as matrix
#> mpg cyl disp hp drat wt
#> Mazda RX4 3.044522 1.791759 5.075174 4.700480 1.360977 0.9631743
#> Mazda RX4 Wag 3.044522 1.791759 5.075174 4.700480 1.360977 1.0560527
#> Datsun 710 3.126761 1.386294 4.682131 4.532599 1.348073 0.8415672
#> Hornet 4 Drive 3.063391 1.791759 5.552960 4.700480 1.124930 1.1678274
#> Hornet Sportabout 2.928524 2.079442 5.886104 5.164786 1.147402 1.2354715
#> Valiant 2.895912 1.791759 5.416100 4.653960 1.015231 1.2412686
#> qsec vs am gear carb
#> Mazda RX4 2.800933 -Inf 0 1.386294 1.3862944
#> Mazda RX4 Wag 2.834389 -Inf 0 1.386294 1.3862944
#> Datsun 710 2.923699 0 0 1.386294 0.0000000
#> Hornet 4 Drive 2.967333 0 -Inf 1.098612 0.0000000
#> Hornet Sportabout 2.834389 -Inf -Inf 1.098612 0.6931472
#> Valiant 3.006672 0 -Inf 1.098612 0.0000000
m <- as.matrix(mtcars)
head(dapply(m, log)) # Same thing
#> mpg cyl disp hp drat wt
#> Mazda RX4 3.044522 1.791759 5.075174 4.700480 1.360977 0.9631743
#> Mazda RX4 Wag 3.044522 1.791759 5.075174 4.700480 1.360977 1.0560527
#> Datsun 710 3.126761 1.386294 4.682131 4.532599 1.348073 0.8415672
#> Hornet 4 Drive 3.063391 1.791759 5.552960 4.700480 1.124930 1.1678274
#> Hornet Sportabout 2.928524 2.079442 5.886104 5.164786 1.147402 1.2354715
#> Valiant 2.895912 1.791759 5.416100 4.653960 1.015231 1.2412686
#> qsec vs am gear carb
#> Mazda RX4 2.800933 -Inf 0 1.386294 1.3862944
#> Mazda RX4 Wag 2.834389 -Inf 0 1.386294 1.3862944
#> Datsun 710 2.923699 0 0 1.386294 0.0000000
#> Hornet 4 Drive 2.967333 0 -Inf 1.098612 0.0000000
#> Hornet Sportabout 2.834389 -Inf -Inf 1.098612 0.6931472
#> Valiant 3.006672 0 -Inf 1.098612 0.0000000
head(dapply(m, log, return = "data.frame")) # Return data frame from matrix
#> mpg cyl disp hp drat wt
#> Mazda RX4 3.044522 1.791759 5.075174 4.700480 1.360977 0.9631743
#> Mazda RX4 Wag 3.044522 1.791759 5.075174 4.700480 1.360977 1.0560527
#> Datsun 710 3.126761 1.386294 4.682131 4.532599 1.348073 0.8415672
#> Hornet 4 Drive 3.063391 1.791759 5.552960 4.700480 1.124930 1.1678274
#> Hornet Sportabout 2.928524 2.079442 5.886104 5.164786 1.147402 1.2354715
#> Valiant 2.895912 1.791759 5.416100 4.653960 1.015231 1.2412686
#> qsec vs am gear carb
#> Mazda RX4 2.800933 -Inf 0 1.386294 1.3862944
#> Mazda RX4 Wag 2.834389 -Inf 0 1.386294 1.3862944
#> Datsun 710 2.923699 0 0 1.386294 0.0000000
#> Hornet 4 Drive 2.967333 0 -Inf 1.098612 0.0000000
#> Hornet Sportabout 2.834389 -Inf -Inf 1.098612 0.6931472
#> Valiant 3.006672 0 -Inf 1.098612 0.0000000
dapply(mtcars, sum); dapply(m, sum) # Computing sum of each column, return as vector
#> mpg cyl disp hp drat wt qsec vs
#> 642.900 198.000 7383.100 4694.000 115.090 102.952 571.160 14.000
#> am gear carb
#> 13.000 118.000 90.000
#> mpg cyl disp hp drat wt qsec vs
#> 642.900 198.000 7383.100 4694.000 115.090 102.952 571.160 14.000
#> am gear carb
#> 13.000 118.000 90.000
dapply(mtcars, sum, drop = FALSE) # This returns a data frame of 1 row
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 642.9 198 7383.1 4694 115.09 102.952 571.16 14 13 118 90
dapply(mtcars, sum, MARGIN = 1) # Compute row-sum of each column, return as vector
#> Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
#> 328.980 329.795 259.580 426.135
#> Hornet Sportabout Valiant Duster 360 Merc 240D
#> 590.310 385.540 656.920 270.980
#> Merc 230 Merc 280 Merc 280C Merc 450SE
#> 299.570 350.460 349.660 510.740
#> Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
#> 511.500 509.850 728.560 726.644
#> Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
#> 725.695 213.850 195.165 206.955
#> Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
#> 273.775 519.650 506.085 646.280
#> Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
#> 631.175 208.215 272.570 273.683
#> Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
#> 670.690 379.590 694.710 288.890
dapply(m, sum, MARGIN = 1) # Same thing for matrices, faster t. apply(m, 1, sum)
#> Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
#> 328.980 329.795 259.580 426.135
#> Hornet Sportabout Valiant Duster 360 Merc 240D
#> 590.310 385.540 656.920 270.980
#> Merc 230 Merc 280 Merc 280C Merc 450SE
#> 299.570 350.460 349.660 510.740
#> Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
#> 511.500 509.850 728.560 726.644
#> Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
#> 725.695 213.850 195.165 206.955
#> Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
#> 273.775 519.650 506.085 646.280
#> Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
#> 631.175 208.215 272.570 273.683
#> Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
#> 670.690 379.590 694.710 288.890
head(dapply(m, sum, MARGIN = 1, drop = FALSE)) # Gives matrix with one column
#> sum
#> Mazda RX4 328.980
#> Mazda RX4 Wag 329.795
#> Datsun 710 259.580
#> Hornet 4 Drive 426.135
#> Hornet Sportabout 590.310
#> Valiant 385.540
head(dapply(m, quantile, MARGIN = 1)) # Compute row-quantiles
#> 0% 25% 50% 75% 100%
#> Mazda RX4 0 3.2600 4.000 18.730 160
#> Mazda RX4 Wag 0 3.3875 4.000 19.010 160
#> Datsun 710 1 1.6600 4.000 20.705 108
#> Hornet 4 Drive 0 2.0000 3.215 20.420 258
#> Hornet Sportabout 0 2.5000 3.440 17.860 360
#> Valiant 0 1.8800 3.460 19.160 225
dapply(m, quantile) # Column-quantiles
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 0% 10.400 4 71.100 52.0 2.760 1.51300 14.5000 0 0 3 1
#> 25% 15.425 4 120.825 96.5 3.080 2.58125 16.8925 0 0 3 2
#> 50% 19.200 6 196.300 123.0 3.695 3.32500 17.7100 0 0 4 2
#> 75% 22.800 8 326.000 180.0 3.920 3.61000 18.9000 1 1 4 4
#> 100% 33.900 8 472.000 335.0 4.930 5.42400 22.9000 1 1 5 8
head(dapply(mtcars, quantile, MARGIN = 1)) # Same for data frames, output is also a data.frame
#> 0% 25% 50% 75% 100%
#> Mazda RX4 0 3.2600 4.000 18.730 160
#> Mazda RX4 Wag 0 3.3875 4.000 19.010 160
#> Datsun 710 1 1.6600 4.000 20.705 108
#> Hornet 4 Drive 0 2.0000 3.215 20.420 258
#> Hornet Sportabout 0 2.5000 3.440 17.860 360
#> Valiant 0 1.8800 3.460 19.160 225
dapply(mtcars, quantile)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 0% 10.400 4 71.100 52.0 2.760 1.51300 14.5000 0 0 3 1
#> 25% 15.425 4 120.825 96.5 3.080 2.58125 16.8925 0 0 3 2
#> 50% 19.200 6 196.300 123.0 3.695 3.32500 17.7100 0 0 4 2
#> 75% 22.800 8 326.000 180.0 3.920 3.61000 18.9000 1 1 4 4
#> 100% 33.900 8 472.000 335.0 4.930 5.42400 22.9000 1 1 5 8
# With classed objects, we have to be a bit careful
if (FALSE) {
dapply(EuStockMarkets, quantile) # This gives an error because the tsp attribute is misspecified
}
dapply(EuStockMarkets, quantile, return = "matrix") # These both work fine..
#> DAX SMI CAC FTSE
#> 0% 1402.340 1587.400 1611.00 2281.000
#> 25% 1744.102 2165.625 1875.15 2843.150
#> 50% 2140.565 2796.350 1992.30 3246.600
#> 75% 2722.367 3812.425 2274.35 3993.575
#> 100% 6186.090 8412.000 4388.50 6179.000
dapply(EuStockMarkets, quantile, return = "data.frame")
#> DAX SMI CAC FTSE
#> 0% 1402.340 1587.400 1611.00 2281.000
#> 25% 1744.102 2165.625 1875.15 2843.150
#> 50% 2140.565 2796.350 1992.30 3246.600
#> 75% 2722.367 3812.425 2274.35 3993.575
#> 100% 6186.090 8412.000 4388.50 6179.000
# Similarly for grouped tibbles and other data frame based classes
library(dplyr)
gmtcars <- group_by(mtcars,cyl,vs,am)
head(dapply(gmtcars, log)) # Still gives a grouped tibble back
#> # A tibble: 6 × 11
#> # Groups: cyl, vs, am [4]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 3.04 1.79 5.08 4.70 1.36 0.963 2.80 -Inf 0 1.39 1.39
#> 2 3.04 1.79 5.08 4.70 1.36 1.06 2.83 -Inf 0 1.39 1.39
#> 3 3.13 1.39 4.68 4.53 1.35 0.842 2.92 0 0 1.39 0
#> 4 3.06 1.79 5.55 4.70 1.12 1.17 2.97 0 -Inf 1.10 0
#> 5 2.93 2.08 5.89 5.16 1.15 1.24 2.83 -Inf -Inf 1.10 0.693
#> 6 2.90 1.79 5.42 4.65 1.02 1.24 3.01 0 -Inf 1.10 0
dapply(gmtcars, quantile, MARGIN = 1) # Here it makes sense to keep the groups attribute
#> # A tibble: 32 × 5
#> # Groups: cyl, vs, am [7]
#> `0%` `25%` `50%` `75%` `100%`
#> * <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 3.26 4 18.7 160
#> 2 0 3.39 4 19.0 160
#> 3 1 1.66 4 20.7 108
#> 4 0 2 3.22 20.4 258
#> 5 0 2.5 3.44 17.9 360
#> 6 0 1.88 3.46 19.2 225
#> 7 0 3.10 4 15.1 360
#> 8 0 2.60 4 22.2 147.
#> 9 0 2.58 4 22.8 141.
#> 10 0 3.68 4 18.8 168.
#> # … with 22 more rows
dapply(gmtcars, quantile) # This does not make much sense, ...
#> # A tibble: 5 × 11
#> # Groups: cyl, vs, am [7]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 10.4 4 71.1 52 2.76 1.51 14.5 0 0 3 1
#> 2 15.4 4 121. 96.5 3.08 2.58 16.9 0 0 3 2
#> 3 19.2 6 196. 123 3.70 3.32 17.7 0 0 4 2
#> 4 22.8 8 326 180 3.92 3.61 18.9 1 1 4 4
#> 5 33.9 8 472 335 4.93 5.42 22.9 1 1 5 8
dapply(gmtcars, quantile, # better convert to plain data.frame:
return = "data.frame")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 0% 10.400 4 71.100 52.0 2.760 1.51300 14.5000 0 0 3 1
#> 25% 15.425 4 120.825 96.5 3.080 2.58125 16.8925 0 0 3 2
#> 50% 19.200 6 196.300 123.0 3.695 3.32500 17.7100 0 0 4 2
#> 75% 22.800 8 326.000 180.0 3.920 3.61000 18.9000 1 1 4 4
#> 100% 33.900 8 472.000 335.0 4.930 5.42400 22.9000 1 1 5 8