TRA is an S3 generic that efficiently transforms data by either (column-wise) replacing data values with supplied statistics or sweeping the statistics out of the data. TRA supports grouped sweeping and replacing operations, and is thus a generalization of sweep.

TRA(x, STATS, FUN = "-", ...)

# S3 method for default
TRA(x, STATS, FUN = "-", g = NULL, ...)

# S3 method for matrix
TRA(x, STATS, FUN = "-", g = NULL, ...)

# S3 method for data.frame
TRA(x, STATS, FUN = "-", g = NULL, ...)

# S3 method for grouped_df
TRA(x, STATS, FUN = "-", keep.group_vars = TRUE, ...)

Arguments

x

a atomic vector, matrix, data frame or grouped data frame (class 'grouped_df').

STATS

a matching set of summary statistics. See Details and Examples.

FUN

an integer or character string indicating the operation to perform. There are 10 supported operations:

Int. String Description
1"replace_fill"replace and overwrite missing values in x
2"replace"replace but preserve missing values in x
3"-"subtract (i.e. center)
4"-+"subtract group-statistics but add group-frequency weighted average of group statistics (i.e. center on overall average statistic)
5"/"divide (i.e. scale. For mean-preserving scaling see also fscale)
6"%"compute percentages (i.e. divide and multiply by 100)
7"+"add
8"*"multiply
9"%%"modulus (i.e. remainder from division by STATS)
10"-%%"subtract modulus (i.e. floor data by STATS)

g

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x. Number of groups must match rows of STATS. See Details.

keep.group_vars

grouped_df method: Logical. FALSE removes grouping variables after computation. See Details and Examples.

...

arguments to be passed to or from other methods.

Details

Without groups (g = NULL), TRA is nothing more than a column based version of sweep, albeit 4-times more efficient on matrices and many times more efficient on data frames. In this case all methods support an atomic vector of statistics of length NCOL(x) passed to STATS. The matrix and data frame methods also support a 1-row matrix or 1-row data frame / list, respectively. TRA always preserves all attributes of x.

With groups passed to g, STATS needs to be of the same type as x and of appropriate dimensions [such that NCOL(x) == NCOL(STATS) and NROW(STATS) equals the number of groups (i.e. the number of levels if g is a factor)]. If this condition is satisfied, TRA will assume that the first row of STATS is the set of statistics computed on the first group/level of g, the second row on the second group/level etc. and do groupwise replacing or sweeping out accordingly.

For example Let x = c(1.2, 4.6, 2.5, 9.1, 8.7, 3.3), g is an integer vector in 3 groups g = c(1,3,3,2,1,2) and STATS = fmean(x,g) = c(4.95, 6.20, 3.55). Then out = TRA(x,STATS,"-",g) = c(-3.75, 1.05, -1.05, 2.90, 3.75, -2.90) [same as fmean(x, g, TRA = "-")] does the equivalent of the following for-loop: for(i in 1:6) out[i] = x[i] - STATS[g[i]].

Correct computation requires that g as used in fmean and g passed to TRA are exactly the same vector. Using g = c(1,3,3,2,1,2) for fmean and g = c(3,1,1,2,3,2) for TRA will not give the right result. The safest way of programming with TRA is thus to repeatedly employ the same factor or GRP object for all grouped computations. Atomic vectors passed to g will be converted to factors (see qF) and lists will be converted to GRP objects. This is also done by all Fast Statistical Functions and by default by BY, thus together with these functions, TRA can also safely be used with atomic- or list-groups. Problems may arise if functions from other packages internally group atomic vectors or lists in a non-sorted way. [Note: as.factor conversions are ok as this also involves sorting.]

If x is a grouped data frame ('grouped_df'), TRA matches the columns of x and STATS and also checks for grouping columns in x and STATS. TRA.grouped_df will then only transform those columns in x for which matching counterparts were found in STATS (exempting grouping columns) and return x again (with columns in the same order). If keep.group_vars = FALSE, the grouping columns are dropped after computation, however the "groups" attribute is not dropped (it can be removed using fungroup() or dplyr::ungroup()).

Value

x with columns replaced or swept out using STATS, (optionally) grouped by g.

Note

In most cases there is no need to call the TRA() function, because of the TRA-argument to all Fast Statistical Functions (ensuring that the exact same grouping vector is used for computing statistics and subsequent transformation). In addition the functions fbetween/B and fwithin/W and fscale/STD provide optimized solutions for frequent scaling, centering and averaging tasks.

See also

Examples

v <- iris$Sepal.Length # A numeric vector f <- iris$Species # A factor dat <- num_vars(iris) # Numeric columns m <- qM(dat) # Matrix of numeric data head(TRA(v, fmean(v))) # Simple centering [same as fmean(v, TRA = "-") or W(v)]
#> [1] -0.7433333 -0.9433333 -1.1433333 -1.2433333 -0.8433333 -0.4433333
head(TRA(m, fmean(m))) # [same as sweep(m, 2, fmean(m)), fmean(m, TRA = "-") or W(m)]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width #> [1,] -0.7433333 0.44266667 -2.358 -0.9993333 #> [2,] -0.9433333 -0.05733333 -2.358 -0.9993333 #> [3,] -1.1433333 0.14266667 -2.458 -0.9993333 #> [4,] -1.2433333 0.04266667 -2.258 -0.9993333 #> [5,] -0.8433333 0.54266667 -2.358 -0.9993333 #> [6,] -0.4433333 0.84266667 -2.058 -0.7993333
head(TRA(dat, fmean(dat))) # [same as fmean(dat, TRA = "-") or W(dat)]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width #> 1 -0.7433333 0.44266667 -2.358 -0.9993333 #> 2 -0.9433333 -0.05733333 -2.358 -0.9993333 #> 3 -1.1433333 0.14266667 -2.458 -0.9993333 #> 4 -1.2433333 0.04266667 -2.258 -0.9993333 #> 5 -0.8433333 0.54266667 -2.358 -0.9993333 #> 6 -0.4433333 0.84266667 -2.058 -0.7993333
head(TRA(v, fmean(v), "replace")) # Simple replacing [same as fmean(v, TRA = "replace") or B(v)]
#> [1] 5.843333 5.843333 5.843333 5.843333 5.843333 5.843333
head(TRA(m, fmean(m), "replace")) # [same as sweep(m, 2, fmean(m)), fmean(m, TRA = 1L) or B(m)]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width #> [1,] 5.843333 3.057333 3.758 1.199333 #> [2,] 5.843333 3.057333 3.758 1.199333 #> [3,] 5.843333 3.057333 3.758 1.199333 #> [4,] 5.843333 3.057333 3.758 1.199333 #> [5,] 5.843333 3.057333 3.758 1.199333 #> [6,] 5.843333 3.057333 3.758 1.199333
head(TRA(dat, fmean(dat), "replace")) # [same as fmean(dat, TRA = "replace") or B(dat)]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width #> 1 5.843333 3.057333 3.758 1.199333 #> 2 5.843333 3.057333 3.758 1.199333 #> 3 5.843333 3.057333 3.758 1.199333 #> 4 5.843333 3.057333 3.758 1.199333 #> 5 5.843333 3.057333 3.758 1.199333 #> 6 5.843333 3.057333 3.758 1.199333
head(TRA(m, fsd(m), "/")) # Simple scaling... [same as fsd(m, TRA = "/")]...
#> Sepal.Length Sepal.Width Petal.Length Petal.Width #> [1,] 6.158928 8.029986 0.7930671 0.2623854 #> [2,] 5.917402 6.882845 0.7930671 0.2623854 #> [3,] 5.675875 7.341701 0.7364195 0.2623854 #> [4,] 5.555112 7.112273 0.8497148 0.2623854 #> [5,] 6.038165 8.259414 0.7930671 0.2623854 #> [6,] 6.521218 8.947698 0.9630101 0.5247707
# Note: All grouped examples also apply for v and dat... head(TRA(m, fmean(m, f), "-", f)) # Centering [same as fmean(m, f, TRA = "-") or W(m, f)]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width #> [1,] 0.094 0.072 -0.062 -0.046 #> [2,] -0.106 -0.428 -0.062 -0.046 #> [3,] -0.306 -0.228 -0.162 -0.046 #> [4,] -0.406 -0.328 0.038 -0.046 #> [5,] -0.006 0.172 -0.062 -0.046 #> [6,] 0.394 0.472 0.238 0.154
head(TRA(m, fmean(m, f), "replace", f)) # Replacing [same fmean(m, f, TRA = "replace") or B(m, f)]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width #> [1,] 5.006 3.428 1.462 0.246 #> [2,] 5.006 3.428 1.462 0.246 #> [3,] 5.006 3.428 1.462 0.246 #> [4,] 5.006 3.428 1.462 0.246 #> [5,] 5.006 3.428 1.462 0.246 #> [6,] 5.006 3.428 1.462 0.246
head(TRA(m, fsd(m, f), "/", f)) # Scaling [same as fsd(m, f, TRA = "/")]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width #> [1,] 14.46851 9.233260 8.061544 1.897793 #> [2,] 13.90112 7.914223 8.061544 1.897793 #> [3,] 13.33372 8.441838 7.485720 1.897793 #> [4,] 13.05003 8.178031 8.637369 1.897793 #> [5,] 14.18481 9.497068 8.061544 1.897793 #> [6,] 15.31960 10.288490 9.789018 3.795585
head(TRA(m, fmean(m, f), "-+", f)) # Centering on the overall mean ...
#> Sepal.Length Sepal.Width Petal.Length Petal.Width #> [1,] 5.937333 3.129333 3.696 1.153333 #> [2,] 5.737333 2.629333 3.696 1.153333 #> [3,] 5.537333 2.829333 3.596 1.153333 #> [4,] 5.437333 2.729333 3.796 1.153333 #> [5,] 5.837333 3.229333 3.696 1.153333 #> [6,] 6.237333 3.529333 3.996 1.353333
# [same as fmean(m, f, TRA = "-+") or # W(m, f, mean = "overall.mean")] head(TRA(TRA(m, fmean(m, f), "-", f), # Also the same thing done manually !! fmean(m), "+"))
#> Sepal.Length Sepal.Width Petal.Length Petal.Width #> [1,] 5.937333 3.129333 3.696 1.153333 #> [2,] 5.737333 2.629333 3.696 1.153333 #> [3,] 5.537333 2.829333 3.596 1.153333 #> [4,] 5.437333 2.729333 3.796 1.153333 #> [5,] 5.837333 3.229333 3.696 1.153333 #> [6,] 6.237333 3.529333 3.996 1.353333
# grouped tibble method library(dplyr) iris %>% group_by(Species) %>% TRA(fmean(.))
#> # A tibble: 150 x 5 #> # Groups: Species [3] #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <fct> #> 1 0.0940 0.0720 -0.062 -0.0460 setosa #> 2 -0.106 -0.428 -0.062 -0.0460 setosa #> 3 -0.306 -0.228 -0.162 -0.0460 setosa #> 4 -0.406 -0.328 0.0380 -0.0460 setosa #> 5 -0.006 0.172 -0.062 -0.0460 setosa #> 6 0.394 0.472 0.238 0.154 setosa #> 7 -0.406 -0.028 -0.062 0.054 setosa #> 8 -0.006 -0.028 0.0380 -0.0460 setosa #> 9 -0.606 -0.528 -0.062 -0.0460 setosa #> 10 -0.106 -0.328 0.0380 -0.146 setosa #> # ... with 140 more rows
iris %>% group_by(Species) %>% fmean(TRA = "-") # Same thing
#> # A tibble: 150 x 5 #> # Groups: Species [3] #> Species Sepal.Length Sepal.Width Petal.Length Petal.Width #> * <fct> <dbl> <dbl> <dbl> <dbl> #> 1 setosa 0.0940 0.0720 -0.062 -0.0460 #> 2 setosa -0.106 -0.428 -0.062 -0.0460 #> 3 setosa -0.306 -0.228 -0.162 -0.0460 #> 4 setosa -0.406 -0.328 0.0380 -0.0460 #> 5 setosa -0.006 0.172 -0.062 -0.0460 #> 6 setosa 0.394 0.472 0.238 0.154 #> 7 setosa -0.406 -0.028 -0.062 0.054 #> 8 setosa -0.006 -0.028 0.0380 -0.0460 #> 9 setosa -0.606 -0.528 -0.062 -0.0460 #> 10 setosa -0.106 -0.328 0.0380 -0.146 #> # ... with 140 more rows
iris %>% group_by(Species) %>% TRA(fmean(.)[c(2,4)]) # Only transforming 2 columns
#> # A tibble: 150 x 5 #> # Groups: Species [3] #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <fct> #> 1 0.0940 3.5 -0.062 0.2 setosa #> 2 -0.106 3 -0.062 0.2 setosa #> 3 -0.306 3.2 -0.162 0.2 setosa #> 4 -0.406 3.1 0.0380 0.2 setosa #> 5 -0.006 3.6 -0.062 0.2 setosa #> 6 0.394 3.9 0.238 0.4 setosa #> 7 -0.406 3.4 -0.062 0.3 setosa #> 8 -0.006 3.4 0.0380 0.2 setosa #> 9 -0.606 2.9 -0.062 0.2 setosa #> 10 -0.106 3.1 0.0380 0.1 setosa #> # ... with 140 more rows
iris %>% group_by(Species) %>% TRA(fmean(.)[c(2,4)], # Dropping species column keep.group_vars = FALSE)
#> # A tibble: 150 x 4 #> # Groups: Species [3] #> Sepal.Length Sepal.Width Petal.Length Petal.Width #> * <dbl> <dbl> <dbl> <dbl> #> 1 0.0940 3.5 -0.062 0.2 #> 2 -0.106 3 -0.062 0.2 #> 3 -0.306 3.2 -0.162 0.2 #> 4 -0.406 3.1 0.0380 0.2 #> 5 -0.006 3.6 -0.062 0.2 #> 6 0.394 3.9 0.238 0.4 #> 7 -0.406 3.4 -0.062 0.3 #> 8 -0.006 3.4 0.0380 0.2 #> 9 -0.606 2.9 -0.062 0.2 #> 10 -0.106 3.1 0.0380 0.1 #> # ... with 140 more rows
iris %>% fgroup_by(Species) %>% TRA(fmean(.)) # Faster collapse grouping...
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 0.094 0.072 -0.062 -0.046 setosa #> 2 -0.106 -0.428 -0.062 -0.046 setosa #> 3 -0.306 -0.228 -0.162 -0.046 setosa #> 4 -0.406 -0.328 0.038 -0.046 setosa #> 5 -0.006 0.172 -0.062 -0.046 setosa #> 6 0.394 0.472 0.238 0.154 setosa #> 7 -0.406 -0.028 -0.062 0.054 setosa #> 8 -0.006 -0.028 0.038 -0.046 setosa #> 9 -0.606 -0.528 -0.062 -0.046 setosa #> 10 -0.106 -0.328 0.038 -0.146 setosa #> 11 0.394 0.272 0.038 -0.046 setosa #> 12 -0.206 -0.028 0.138 -0.046 setosa #> 13 -0.206 -0.428 -0.062 -0.146 setosa #> 14 -0.706 -0.428 -0.362 -0.146 setosa #> 15 0.794 0.572 -0.262 -0.046 setosa #> 16 0.694 0.972 0.038 0.154 setosa #> 17 0.394 0.472 -0.162 0.154 setosa #> 18 0.094 0.072 -0.062 0.054 setosa #> 19 0.694 0.372 0.238 0.054 setosa #> 20 0.094 0.372 0.038 0.054 setosa #> 21 0.394 -0.028 0.238 -0.046 setosa #> 22 0.094 0.272 0.038 0.154 setosa #> 23 -0.406 0.172 -0.462 -0.046 setosa #> 24 0.094 -0.128 0.238 0.254 setosa #> 25 -0.206 -0.028 0.438 -0.046 setosa #> 26 -0.006 -0.428 0.138 -0.046 setosa #> 27 -0.006 -0.028 0.138 0.154 setosa #> 28 0.194 0.072 0.038 -0.046 setosa #> 29 0.194 -0.028 -0.062 -0.046 setosa #> 30 -0.306 -0.228 0.138 -0.046 setosa #> 31 -0.206 -0.328 0.138 -0.046 setosa #> 32 0.394 -0.028 0.038 0.154 setosa #> 33 0.194 0.672 0.038 -0.146 setosa #> 34 0.494 0.772 -0.062 -0.046 setosa #> 35 -0.106 -0.328 0.038 -0.046 setosa #> 36 -0.006 -0.228 -0.262 -0.046 setosa #> 37 0.494 0.072 -0.162 -0.046 setosa #> 38 -0.106 0.172 -0.062 -0.146 setosa #> 39 -0.606 -0.428 -0.162 -0.046 setosa #> 40 0.094 -0.028 0.038 -0.046 setosa #> 41 -0.006 0.072 -0.162 0.054 setosa #> 42 -0.506 -1.128 -0.162 0.054 setosa #> 43 -0.606 -0.228 -0.162 -0.046 setosa #> 44 -0.006 0.072 0.138 0.354 setosa #> 45 0.094 0.372 0.438 0.154 setosa #> 46 -0.206 -0.428 -0.062 0.054 setosa #> 47 0.094 0.372 0.138 -0.046 setosa #> 48 -0.406 -0.228 -0.062 -0.046 setosa #> 49 0.294 0.272 0.038 -0.046 setosa #> 50 -0.006 -0.128 -0.062 -0.046 setosa #> 51 1.064 0.430 0.440 0.074 versicolor #> 52 0.464 0.430 0.240 0.174 versicolor #> 53 0.964 0.330 0.640 0.174 versicolor #> 54 -0.436 -0.470 -0.260 -0.026 versicolor #> 55 0.564 0.030 0.340 0.174 versicolor #> 56 -0.236 0.030 0.240 -0.026 versicolor #> 57 0.364 0.530 0.440 0.274 versicolor #> 58 -1.036 -0.370 -0.960 -0.326 versicolor #> 59 0.664 0.130 0.340 -0.026 versicolor #> 60 -0.736 -0.070 -0.360 0.074 versicolor #> 61 -0.936 -0.770 -0.760 -0.326 versicolor #> 62 -0.036 0.230 -0.060 0.174 versicolor #> 63 0.064 -0.570 -0.260 -0.326 versicolor #> 64 0.164 0.130 0.440 0.074 versicolor #> 65 -0.336 0.130 -0.660 -0.026 versicolor #> 66 0.764 0.330 0.140 0.074 versicolor #> 67 -0.336 0.230 0.240 0.174 versicolor #> 68 -0.136 -0.070 -0.160 -0.326 versicolor #> 69 0.264 -0.570 0.240 0.174 versicolor #> 70 -0.336 -0.270 -0.360 -0.226 versicolor #> 71 -0.036 0.430 0.540 0.474 versicolor #> 72 0.164 0.030 -0.260 -0.026 versicolor #> 73 0.364 -0.270 0.640 0.174 versicolor #> 74 0.164 0.030 0.440 -0.126 versicolor #> 75 0.464 0.130 0.040 -0.026 versicolor #> 76 0.664 0.230 0.140 0.074 versicolor #> 77 0.864 0.030 0.540 0.074 versicolor #> 78 0.764 0.230 0.740 0.374 versicolor #> 79 0.064 0.130 0.240 0.174 versicolor #> 80 -0.236 -0.170 -0.760 -0.326 versicolor #> 81 -0.436 -0.370 -0.460 -0.226 versicolor #> 82 -0.436 -0.370 -0.560 -0.326 versicolor #> 83 -0.136 -0.070 -0.360 -0.126 versicolor #> 84 0.064 -0.070 0.840 0.274 versicolor #> 85 -0.536 0.230 0.240 0.174 versicolor #> 86 0.064 0.630 0.240 0.274 versicolor #> 87 0.764 0.330 0.440 0.174 versicolor #> 88 0.364 -0.470 0.140 -0.026 versicolor #> 89 -0.336 0.230 -0.160 -0.026 versicolor #> 90 -0.436 -0.270 -0.260 -0.026 versicolor #> 91 -0.436 -0.170 0.140 -0.126 versicolor #> 92 0.164 0.230 0.340 0.074 versicolor #> 93 -0.136 -0.170 -0.260 -0.126 versicolor #> 94 -0.936 -0.470 -0.960 -0.326 versicolor #> 95 -0.336 -0.070 -0.060 -0.026 versicolor #> 96 -0.236 0.230 -0.060 -0.126 versicolor #> 97 -0.236 0.130 -0.060 -0.026 versicolor #> 98 0.264 0.130 0.040 -0.026 versicolor #> 99 -0.836 -0.270 -1.260 -0.226 versicolor #> 100 -0.236 0.030 -0.160 -0.026 versicolor #> 101 -0.288 0.326 0.448 0.474 virginica #> 102 -0.788 -0.274 -0.452 -0.126 virginica #> 103 0.512 0.026 0.348 0.074 virginica #> 104 -0.288 -0.074 0.048 -0.226 virginica #> 105 -0.088 0.026 0.248 0.174 virginica #> 106 1.012 0.026 1.048 0.074 virginica #> 107 -1.688 -0.474 -1.052 -0.326 virginica #> 108 0.712 -0.074 0.748 -0.226 virginica #> 109 0.112 -0.474 0.248 -0.226 virginica #> 110 0.612 0.626 0.548 0.474 virginica #> 111 -0.088 0.226 -0.452 -0.026 virginica #> 112 -0.188 -0.274 -0.252 -0.126 virginica #> 113 0.212 0.026 -0.052 0.074 virginica #> 114 -0.888 -0.474 -0.552 -0.026 virginica #> 115 -0.788 -0.174 -0.452 0.374 virginica #> 116 -0.188 0.226 -0.252 0.274 virginica #> 117 -0.088 0.026 -0.052 -0.226 virginica #> 118 1.112 0.826 1.148 0.174 virginica #> 119 1.112 -0.374 1.348 0.274 virginica #> 120 -0.588 -0.774 -0.552 -0.526 virginica #> 121 0.312 0.226 0.148 0.274 virginica #> 122 -0.988 -0.174 -0.652 -0.026 virginica #> 123 1.112 -0.174 1.148 -0.026 virginica #> 124 -0.288 -0.274 -0.652 -0.226 virginica #> 125 0.112 0.326 0.148 0.074 virginica #> 126 0.612 0.226 0.448 -0.226 virginica #> 127 -0.388 -0.174 -0.752 -0.226 virginica #> 128 -0.488 0.026 -0.652 -0.226 virginica #> 129 -0.188 -0.174 0.048 0.074 virginica #> 130 0.612 0.026 0.248 -0.426 virginica #> 131 0.812 -0.174 0.548 -0.126 virginica #> 132 1.312 0.826 0.848 -0.026 virginica #> 133 -0.188 -0.174 0.048 0.174 virginica #> 134 -0.288 -0.174 -0.452 -0.526 virginica #> 135 -0.488 -0.374 0.048 -0.626 virginica #> 136 1.112 0.026 0.548 0.274 virginica #> 137 -0.288 0.426 0.048 0.374 virginica #> 138 -0.188 0.126 -0.052 -0.226 virginica #> 139 -0.588 0.026 -0.752 -0.226 virginica #> 140 0.312 0.126 -0.152 0.074 virginica #> 141 0.112 0.126 0.048 0.374 virginica #> 142 0.312 0.126 -0.452 0.274 virginica #> 143 -0.788 -0.274 -0.452 -0.126 virginica #> 144 0.212 0.226 0.348 0.274 virginica #> 145 0.112 0.326 0.148 0.474 virginica #> 146 0.112 0.026 -0.352 0.274 virginica #> 147 -0.288 -0.474 -0.552 -0.126 virginica #> 148 -0.088 0.026 -0.352 -0.026 virginica #> 149 -0.388 0.426 -0.152 0.274 virginica #> 150 -0.688 0.026 -0.452 -0.226 virginica #> #> Grouped by: Species [3 | 50 (0)]