ftransform is a much faster version of transform and dplyr::mutate for data frames. It returns the data frame with new columns computed and/or existing columns modified or deleted. settransform does all of that by reference i.e. it modifies the data frame in the global environment. fcompute can be used to compute new columns from the columns in a data frame and returns only the computed columns.

# Modify and return data frame
ftransform(.data, ...)
ftransformv(.data, vars, FUN, ..., apply = TRUE)
tfm(.data, ...)               # Shortcut for ftransform
tfmv(.data, vars, FUN, ..., apply = TRUE)

# Modify data frame by reference
settransform(.data, ...)
settransformv(.data, vars, FUN, ..., apply = TRUE)
settfm(.data, ...)            # Shortcut for settransform
settfmv(.data, vars, FUN, ..., apply = TRUE)

# Replace/add modified columns in/to a data frame
ftransform(.data) <- value
tfm(.data) <- value           # Shortcut for ftransform<-

# Compute columns, returned as a new data frame
fcompute(.data, ...)

Arguments

.data

a data frame or named list of columns.

...

further arguments of the form column = value. The value can be a combination of other columns, a scalar value, or NULL, which deletes column. Alternatively it is also possible to place a single list here, which will be treated like a list of column = value arguments. For ftransformv, ... can be used to pass further arguments to FUN. Note: The ellipsis (...) is always evaluated within the data frame (.data) environment. See Examples.

vars

variables to be transformed by applying FUN to them: select using names, indices, a logical vector or a selector function (e.g. is.numeric).

FUN

a single function yielding a result of length NROW(.data) or 1. See also apply.

apply

logical. TRUE (default) will apply FUN to each column selected in vars; FALSE will apply FUN to the subsetted data frame i.e. FUN(get_vars(.data, vars), ...). The latter is useful for collapse functions with data frame or grouped / panel data frame methods, yielding performance gains and enabling grouped transformations. See Examples.

value

a named list of replacements, it will be treated like an evaluated list of column = value arguments.

Details

The ... arguments to ftransform are tagged vector expressions, which are evaluated in the data frame .data. The tags are matched against names(.data), and for those that match, the values replace the corresponding variable in .data, whereas the others are appended to .data. It is also possible to delete columns by assigning NULL to them, i.e. ftransform(data, colk = NULL) removes colk from the data. Note that names(.data) and the names of the ... arguments are checked for uniqueness beforehand, yielding an error if this is not the case.

Since collapse v1.3.0, is is also possible to pass a single named list to ..., i.e. ftransform(data, newdata). This list will be treated like a list of tagged vector expressions. Note the different behavior: ftransform(data, list(newcol = col1)) is the same as ftransform(data, newcol = col1), whereas ftransform(data, newcol = as.list(col1)) creates a list column. Something like ftransform(data, as.list(col1)) gives an error because the list is not named. See Examples.

The function ftransformv added in v1.3.2 provides a fast replacement for the functions dplyr::mutate_at and dplyr::mutate_if facilitating mutations of groups of columns (dplyr::mutate_all is already accounted for by dapply). See Examples.

The function settransform does all of that by reference, but uses base-R's copy-on modify semantics, which is equivalent to replacing the data with <- (thus it is still memory efficient but the data will have a different memory address afterwards).

The function fcompute works just like ftransform, but returns only the changed / computed columns without modifying or appending the data in .data.

Value

The modified data frame .data, or, for fcompute, a new data frame with the columns computed on .data. All attributes of .data are preserved.

See also

Examples

## ftransform modifies and returns a data.frame head(ftransform(airquality, Ozone = -Ozone))
#> Ozone Solar.R Wind Temp Month Day #> 1 -41 190 7.4 67 5 1 #> 2 -36 118 8.0 72 5 2 #> 3 -12 149 12.6 74 5 3 #> 4 -18 313 11.5 62 5 4 #> 5 NA NA 14.3 56 5 5 #> 6 -28 NA 14.9 66 5 6
head(ftransform(airquality, new = -Ozone, Temp = (Temp-32)/1.8))
#> Ozone Solar.R Wind Temp Month Day new #> 1 41 190 7.4 19.44444 5 1 -41 #> 2 36 118 8.0 22.22222 5 2 -36 #> 3 12 149 12.6 23.33333 5 3 -12 #> 4 18 313 11.5 16.66667 5 4 -18 #> 5 NA NA 14.3 13.33333 5 5 NA #> 6 28 NA 14.9 18.88889 5 6 -28
head(ftransform(airquality, new = -Ozone, new2 = 1, Temp = NULL)) # Deleting Temp
#> Ozone Solar.R Wind Month Day new new2 #> 1 41 190 7.4 5 1 -41 1 #> 2 36 118 8.0 5 2 -36 1 #> 3 12 149 12.6 5 3 -12 1 #> 4 18 313 11.5 5 4 -18 1 #> 5 NA NA 14.3 5 5 NA 1 #> 6 28 NA 14.9 5 6 -28 1
head(ftransform(airquality, Ozone = NULL, Temp = NULL)) # Deleting columns
#> Solar.R Wind Month Day #> 1 190 7.4 5 1 #> 2 118 8.0 5 2 #> 3 149 12.6 5 3 #> 4 313 11.5 5 4 #> 5 NA 14.3 5 5 #> 6 NA 14.9 5 6
# With collapse's grouped and weighted functions, complex operations are done on the fly head(ftransform(airquality, # Grouped operations by month: Ozone_Month_median = fmedian(Ozone, Month, TRA = "replace_fill"), Ozone_Month_sd = fsd(Ozone, Month, TRA = "replace"), Ozone_Month_centered = fwithin(Ozone, Month)))
#> Ozone Solar.R Wind Temp Month Day Ozone_Month_median Ozone_Month_sd #> 1 41 190 7.4 67 5 1 18 22.22445 #> 2 36 118 8.0 72 5 2 18 22.22445 #> 3 12 149 12.6 74 5 3 18 22.22445 #> 4 18 313 11.5 62 5 4 18 22.22445 #> 5 NA NA 14.3 56 5 5 18 NA #> 6 28 NA 14.9 66 5 6 18 22.22445 #> Ozone_Month_centered #> 1 17.384615 #> 2 12.384615 #> 3 -11.615385 #> 4 -5.615385 #> 5 NA #> 6 4.384615
# Grouping by month and above/below average temperature in each month head(ftransform(airquality, Ozone_Month_high_median = fmedian(Ozone, list(Month, Temp > fbetween(Temp, Month)), TRA = "replace_fill")))
#> Ozone Solar.R Wind Temp Month Day Ozone_Month_high_median #> 1 41 190 7.4 67 5 1 28 #> 2 36 118 8.0 72 5 2 28 #> 3 12 149 12.6 74 5 3 28 #> 4 18 313 11.5 62 5 4 14 #> 5 NA NA 14.3 56 5 5 14 #> 6 28 NA 14.9 66 5 6 28
## ftransformv can be used to modify multiple columns using a function head(ftransformv(airquality, 1:3, log))
#> Ozone Solar.R Wind Temp Month Day #> 1 3.713572 5.247024 2.001480 67 5 1 #> 2 3.583519 4.770685 2.079442 72 5 2 #> 3 2.484907 5.003946 2.533697 74 5 3 #> 4 2.890372 5.746203 2.442347 62 5 4 #> 5 NA NA 2.660260 56 5 5 #> 6 3.332205 NA 2.701361 66 5 6
head(`[<-`(airquality, 1:3, value = lapply(airquality[1:3], log))) # Same thing in base R
#> Ozone Solar.R Wind Temp Month Day #> 1 3.713572 5.247024 2.001480 67 5 1 #> 2 3.583519 4.770685 2.079442 72 5 2 #> 3 2.484907 5.003946 2.533697 74 5 3 #> 4 2.890372 5.746203 2.442347 62 5 4 #> 5 NA NA 2.660260 56 5 5 #> 6 3.332205 NA 2.701361 66 5 6
head(ftransformv(airquality, 1:3, log, apply = FALSE))
#> Ozone Solar.R Wind Temp Month Day #> 1 3.713572 5.247024 2.001480 67 5 1 #> 2 3.583519 4.770685 2.079442 72 5 2 #> 3 2.484907 5.003946 2.533697 74 5 3 #> 4 2.890372 5.746203 2.442347 62 5 4 #> 5 NA NA 2.660260 56 5 5 #> 6 3.332205 NA 2.701361 66 5 6
head(`[<-`(airquality, 1:3, value = log(airquality[1:3]))) # Same thing in base R
#> Ozone Solar.R Wind Temp Month Day #> 1 3.713572 5.247024 2.001480 67 5 1 #> 2 3.583519 4.770685 2.079442 72 5 2 #> 3 2.484907 5.003946 2.533697 74 5 3 #> 4 2.890372 5.746203 2.442347 62 5 4 #> 5 NA NA 2.660260 56 5 5 #> 6 3.332205 NA 2.701361 66 5 6
# Using apply = FALSE yields meaningful performance gains with collapse functions # This calls fwithin.default, and repeates the grouping by month 3 times: head(ftransformv(airquality, 1:3, fwithin, Month))
#> Ozone Solar.R Wind Temp Month Day #> 1 17.384615 8.703704 -4.2225806 67 5 1 #> 2 12.384615 -63.296296 -3.6225806 72 5 2 #> 3 -11.615385 -32.296296 0.9774194 74 5 3 #> 4 -5.615385 131.703704 -0.1225806 62 5 4 #> 5 NA NA 2.6774194 56 5 5 #> 6 4.384615 NA 3.2774194 66 5 6
# This calls fwithin.data.frame, and only groups one time -> 5x faster! head(ftransformv(airquality, 1:3, fwithin, Month, apply = FALSE))
#> Ozone Solar.R Wind Temp Month Day #> 1 17.384615 8.703704 -4.2225806 67 5 1 #> 2 12.384615 -63.296296 -3.6225806 72 5 2 #> 3 -11.615385 -32.296296 0.9774194 74 5 3 #> 4 -5.615385 131.703704 -0.1225806 62 5 4 #> 5 NA NA 2.6774194 56 5 5 #> 6 4.384615 NA 3.2774194 66 5 6
library(magrittr) # Pipe operators # This also works for grouped and panel data frames (calling fwithin.grouped_df) airquality %>% fgroup_by(Month) %>% ftransformv(1:3, fwithin, apply = FALSE) %>% head
#> Ozone Solar.R Wind Temp Month Day #> 1 17.384615 8.703704 -4.2225806 67 5 1 #> 2 12.384615 -63.296296 -3.6225806 72 5 2 #> 3 -11.615385 -32.296296 0.9774194 74 5 3 #> 4 -5.615385 131.703704 -0.1225806 62 5 4 #> 5 NA NA 2.6774194 56 5 5 #> 6 4.384615 NA 3.2774194 66 5 6
# But this gives the WRONG result (calling fwithin.default). Need option apply = FALSE!! airquality %>% fgroup_by(Month) %>% ftransformv(1:3, fwithin) %>% head
#> Ozone Solar.R Wind Temp Month Day #> 1 -1.12931 4.068493 -2.557516 67 5 1 #> 2 -6.12931 -67.931507 -1.957516 72 5 2 #> 3 -30.12931 -36.931507 2.642484 74 5 3 #> 4 -24.12931 127.068493 1.542484 62 5 4 #> 5 NA NA 4.342484 56 5 5 #> 6 -14.12931 NA 4.942484 66 5 6
# For grouped modification of single columns in a grouped dataset, we can use GRP(): airquality %>% fgroup_by(Month) %>% ftransform(W_Ozone = fwithin(Ozone, GRP(.)), # Grouped centering sd_Ozone_m = fsd(Ozone, GRP(.), TRA = "replace"), # In-Month standard deviation sd_Ozone = fsd(Ozone, TRA = "replace"), # Overall standard deviation sd_Ozone2 = fsd(Ozone, TRA = "replace_fill"), # Same, overwriting NA's sd_Ozone3 = fsd(Ozone)) %>% head # Same thing (calling rep())
#> Ozone Solar.R Wind Temp Month Day W_Ozone sd_Ozone_m sd_Ozone sd_Ozone2 #> 1 41 190 7.4 67 5 1 17.384615 22.22445 32.98788 32.98788 #> 2 36 118 8.0 72 5 2 12.384615 22.22445 32.98788 32.98788 #> 3 12 149 12.6 74 5 3 -11.615385 22.22445 32.98788 32.98788 #> 4 18 313 11.5 62 5 4 -5.615385 22.22445 32.98788 32.98788 #> 5 NA NA 14.3 56 5 5 NA NA NA 32.98788 #> 6 28 NA 14.9 66 5 6 4.384615 22.22445 32.98788 32.98788 #> sd_Ozone3 #> 1 32.98788 #> 2 32.98788 #> 3 32.98788 #> 4 32.98788 #> 5 32.98788 #> 6 32.98788
rm(airquality)
#> Warning: object 'airquality' not found
## For more complex mutations we can use ftransform with compound pipes airquality %>% fgroup_by(Month) %>% ftransform(get_vars(., 1:3) %>% fwithin %>% flag(0:2)) %>% head
#> Panel-lag computed without timevar: Assuming ordered data
#> Ozone Solar.R Wind Temp Month Day L1.Ozone L2.Ozone #> 1 17.384615 8.703704 -4.2225806 67 5 1 NA NA #> 2 12.384615 -63.296296 -3.6225806 72 5 2 17.384615 NA #> 3 -11.615385 -32.296296 0.9774194 74 5 3 12.384615 17.384615 #> 4 -5.615385 131.703704 -0.1225806 62 5 4 -11.615385 12.384615 #> 5 NA NA 2.6774194 56 5 5 -5.615385 -11.615385 #> 6 4.384615 NA 3.2774194 66 5 6 NA -5.615385 #> L1.Solar.R L2.Solar.R L1.Wind L2.Wind #> 1 NA NA NA NA #> 2 8.703704 NA -4.2225806 NA #> 3 -63.296296 8.703704 -3.6225806 -4.2225806 #> 4 -32.296296 -63.296296 0.9774194 -3.6225806 #> 5 131.703704 -32.296296 -0.1225806 0.9774194 #> 6 NA 131.703704 2.6774194 -0.1225806
airquality %>% ftransform(STD(., cols = 1:3) %>% replace_NA(0)) %>% head
#> Ozone Solar.R Wind Temp Month Day STD.Ozone STD.Solar.R STD.Wind #> 1 41 190 7.4 67 5 1 -0.03423409 0.04517615 -0.7259482 #> 2 36 118 8.0 72 5 2 -0.18580489 -0.75430487 -0.5556388 #> 3 12 149 12.6 74 5 3 -0.91334473 -0.41008388 0.7500660 #> 4 18 313 11.5 62 5 4 -0.73145977 1.41095624 0.4378323 #> 5 NA NA 14.3 56 5 5 0.00000000 0.00000000 1.2326091 #> 6 28 NA 14.9 66 5 6 -0.42831817 0.00000000 1.4029185
# The list argument feature also allows flexible operations creating multiple new columns airquality %>% # The variance of Wind and Ozone, by month, weighted by temperature: ftransform(fvar(list(Wind_var = Wind, Ozone_var = Ozone), Month, Temp, "replace")) %>% head
#> Ozone Solar.R Wind Temp Month Day Wind_var Ozone_var #> 1 41 190 7.4 67 5 1 12.08975 533.2819 #> 2 36 118 8.0 72 5 2 12.08975 533.2819 #> 3 12 149 12.6 74 5 3 12.08975 533.2819 #> 4 18 313 11.5 62 5 4 12.08975 533.2819 #> 5 NA NA 14.3 56 5 5 12.08975 NA #> 6 28 NA 14.9 66 5 6 12.08975 533.2819
# Same as above using a grouped data frame (a bit more complex) airquality %>% fgroup_by(Month) %>% ftransform(fselect(., Wind, Ozone) %>% fvar(Temp, "replace") %>% add_stub("_var", FALSE)) %>% fungroup %>% head
#> Ozone Solar.R Wind Temp Month Day Wind_var Ozone_var #> 1 41 190 7.4 67 5 1 12.08975 533.2819 #> 2 36 118 8.0 72 5 2 12.08975 533.2819 #> 3 12 149 12.6 74 5 3 12.08975 533.2819 #> 4 18 313 11.5 62 5 4 12.08975 533.2819 #> 5 NA NA 14.3 56 5 5 12.08975 NA #> 6 28 NA 14.9 66 5 6 12.08975 533.2819
# This performs 2 different multi-column grouped operations (need c() to make it one list) ftransform(airquality, c(fmedian(list(Wind_Day_median = Wind, Ozone_Day_median = Ozone), Day, TRA = "replace"), fsd(list(Wind_Month_sd = Wind, Ozone_Month_sd = Ozone), Month, TRA = "replace"))) %>% head
#> Ozone Solar.R Wind Temp Month Day Wind_Day_median Ozone_Day_median #> 1 41 190 7.4 67 5 1 6.9 68.5 #> 2 36 118 8.0 72 5 2 9.2 42.5 #> 3 12 149 12.6 74 5 3 9.2 24.0 #> 4 18 313 11.5 62 5 4 9.2 78.0 #> 5 NA NA 14.3 56 5 5 7.4 NA #> 6 28 NA 14.9 66 5 6 14.3 36.0 #> Wind_Month_sd Ozone_Month_sd #> 1 3.53145 22.22445 #> 2 3.53145 22.22445 #> 3 3.53145 22.22445 #> 4 3.53145 22.22445 #> 5 3.53145 NA #> 6 3.53145 22.22445
## settransform(v) works like ftransform(v) but modifies a data frame in the global environment.. settransform(airquality, Ratio = Ozone / Temp, Ozone = NULL, Temp = NULL) head(airquality)
#> Solar.R Wind Month Day Ratio #> 1 190 7.4 5 1 0.6119403 #> 2 118 8.0 5 2 0.5000000 #> 3 149 12.6 5 3 0.1621622 #> 4 313 11.5 5 4 0.2903226 #> 5 NA 14.3 5 5 NA #> 6 NA 14.9 5 6 0.4242424
rm(airquality) # Grouped and weighted centering settransformv(airquality, 1:3, fwithin, Month, Temp, apply = FALSE) head(airquality)
#> Ozone Solar.R Wind Temp Month Day #> 1 16.22536 3.571669 -4.08917323 67 5 1 #> 2 11.22536 -68.428331 -3.48917323 72 5 2 #> 3 -12.77464 -37.428331 1.11082677 74 5 3 #> 4 -6.77464 126.571669 0.01082677 62 5 4 #> 5 NA NA 2.81082677 56 5 5 #> 6 3.22536 NA 3.41082677 66 5 6
rm(airquality) # Suitably lagged first-differences settransform(airquality, get_vars(airquality, 1:3) %>% fdiff %>% flag(0:2)) head(airquality)
#> Ozone Solar.R Wind Temp Month Day L1.Ozone L2.Ozone L1.Solar.R L2.Solar.R #> 1 NA NA NA 67 5 1 NA NA NA NA #> 2 -5 -72 0.6 72 5 2 NA NA NA NA #> 3 -24 31 4.6 74 5 3 -5 NA -72 NA #> 4 6 164 -1.1 62 5 4 -24 -5 31 -72 #> 5 NA NA 2.8 56 5 5 6 -24 164 31 #> 6 NA NA 0.6 66 5 6 NA 6 NA 164 #> L1.Wind L2.Wind #> 1 NA NA #> 2 NA NA #> 3 0.6 NA #> 4 4.6 0.6 #> 5 -1.1 4.6 #> 6 2.8 -1.1
rm(airquality) # Same as above using magrittr::`%<>%` airquality %<>% ftransform(get_vars(., 1:3) %>% fdiff %>% flag(0:2)) head(airquality)
#> Ozone Solar.R Wind Temp Month Day L1.Ozone L2.Ozone L1.Solar.R L2.Solar.R #> 1 NA NA NA 67 5 1 NA NA NA NA #> 2 -5 -72 0.6 72 5 2 NA NA NA NA #> 3 -24 31 4.6 74 5 3 -5 NA -72 NA #> 4 6 164 -1.1 62 5 4 -24 -5 31 -72 #> 5 NA NA 2.8 56 5 5 6 -24 164 31 #> 6 NA NA 0.6 66 5 6 NA 6 NA 164 #> L1.Wind L2.Wind #> 1 NA NA #> 2 NA NA #> 3 0.6 NA #> 4 4.6 0.6 #> 5 -1.1 4.6 #> 6 2.8 -1.1
rm(airquality) # It is also possible to achieve the same thing via a replacement method (if needed) ftransform(airquality) <- get_vars(airquality, 1:3) %>% fdiff %>% flag(0:2) head(airquality)
#> Ozone Solar.R Wind Temp Month Day L1.Ozone L2.Ozone L1.Solar.R L2.Solar.R #> 1 NA NA NA 67 5 1 NA NA NA NA #> 2 -5 -72 0.6 72 5 2 NA NA NA NA #> 3 -24 31 4.6 74 5 3 -5 NA -72 NA #> 4 6 164 -1.1 62 5 4 -24 -5 31 -72 #> 5 NA NA 2.8 56 5 5 6 -24 164 31 #> 6 NA NA 0.6 66 5 6 NA 6 NA 164 #> L1.Wind L2.Wind #> 1 NA NA #> 2 NA NA #> 3 0.6 NA #> 4 4.6 0.6 #> 5 -1.1 4.6 #> 6 2.8 -1.1
rm(airquality) ## fcompute only returns the modified / computed columns, ... head(fcompute(airquality, Ozone = -Ozone))
#> Ozone #> 1 -41 #> 2 -36 #> 3 -12 #> 4 -18 #> 5 NA #> 6 -28
head(fcompute(airquality, new = -Ozone, Temp = (Temp-32)/1.8))
#> new Temp #> 1 -41 19.44444 #> 2 -36 22.22222 #> 3 -12 23.33333 #> 4 -18 16.66667 #> 5 NA 13.33333 #> 6 -28 18.88889
head(fcompute(airquality, new = -Ozone, new2 = 1))
#> new new2 #> 1 -41 1 #> 2 -36 1 #> 3 -12 1 #> 4 -18 1 #> 5 NA 1 #> 6 -28 1