fbetween and fwithin are S3 generics to efficiently obtain between-transformed (averaged) or (quasi-)within-transformed (demeaned) data. These operations can be performed groupwise and/or weighted. B and W are wrappers around fbetween and fwithin representing the 'between-operator' and the 'within-operator'.

(B / W provide more flexibility than fbetween / fwithin when applied to data frames (i.e. column subsetting, formula input, auto-renaming and id-variable-preservation capabilities...), but are otherwise identical.)

fbetween(x, ...)
 fwithin(x, ...)
       B(x, ...)
       W(x, ...)

# S3 method for default
fbetween(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, ...)
# S3 method for default
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, mean = 0, theta = 1, ...)
# S3 method for default
B(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, ...)
# S3 method for default
W(x, g = NULL, w = NULL, na.rm = TRUE, mean = 0, theta = 1, ...)

# S3 method for matrix
fbetween(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, ...)
# S3 method for matrix
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, mean = 0, theta = 1, ...)
# S3 method for matrix
B(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, stub = "B.", ...)
# S3 method for matrix
W(x, g = NULL, w = NULL, na.rm = TRUE, mean = 0, theta = 1, stub = "W.", ...)

# S3 method for data.frame
fbetween(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, ...)
# S3 method for data.frame
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, mean = 0, theta = 1, ...)
# S3 method for data.frame
B(x, by = NULL, w = NULL, cols = is.numeric, na.rm = TRUE,
  fill = FALSE, stub = "B.", keep.by = TRUE, keep.w = TRUE, ...)
# S3 method for data.frame
W(x, by = NULL, w = NULL, cols = is.numeric, na.rm = TRUE,
  mean = 0, theta = 1, stub = "W.", keep.by = TRUE, keep.w = TRUE, ...)

# Methods for compatibility with plm:

# S3 method for pseries
fbetween(x, effect = 1L, w = NULL, na.rm = TRUE, fill = FALSE, ...)
# S3 method for pseries
fwithin(x, effect = 1L, w = NULL, na.rm = TRUE, mean = 0, theta = 1, ...)
# S3 method for pseries
B(x, effect = 1L, w = NULL, na.rm = TRUE, fill = FALSE, ...)
# S3 method for pseries
W(x, effect = 1L, w = NULL, na.rm = TRUE, mean = 0, theta = 1, ...)

# S3 method for pdata.frame
fbetween(x, effect = 1L, w = NULL, na.rm = TRUE, fill = FALSE, ...)
# S3 method for pdata.frame
fwithin(x, effect = 1L, w = NULL, na.rm = TRUE, mean = 0, theta = 1, ...)
# S3 method for pdata.frame
B(x, effect = 1L, w = NULL, cols = is.numeric, na.rm = TRUE,
  fill = FALSE, stub = "B.", keep.ids = TRUE, keep.w = TRUE, ...)
# S3 method for pdata.frame
W(x, effect = 1L, w = NULL, cols = is.numeric, na.rm = TRUE,
  mean = 0, theta = 1, stub = "W.", keep.ids = TRUE, keep.w = TRUE, ...)

# Methods for compatibility with dplyr:

# S3 method for grouped_df
fbetween(x, w = NULL, na.rm = TRUE, fill = FALSE,
         keep.group_vars = TRUE, keep.w = TRUE, ...)
# S3 method for grouped_df
fwithin(x, w = NULL, na.rm = TRUE, mean = 0, theta = 1,
        keep.group_vars = TRUE, keep.w = TRUE, ...)
# S3 method for grouped_df
B(x, w = NULL, na.rm = TRUE, fill = FALSE,
  stub = "B.", keep.group_vars = TRUE, keep.w = TRUE, ...)
# S3 method for grouped_df
W(x, w = NULL, na.rm = TRUE, mean = 0, theta = 1,
  stub = "W.", keep.group_vars = TRUE, keep.w = TRUE, ...)

Arguments

x

a numeric vector, matrix, data frame, panel series (plm::pseries), panel data frame (plm::pdata.frame) or grouped tibble (dplyr::grouped_df).

g

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x.

by

B and W data.frame method: Same as g, but also allows one- or two-sided formulas i.e. ~ group1 or var1 + var2 ~ group1 + group2. See Examples.

w

a numeric vector of (non-negative) weights. B/W data frame and pdata.frame methods also allow a one-sided formula i.e. ~ weightcol. The grouped_df (dplyr) method supports lazy-evaluation. See Examples.

cols

data.frame method: Select columns to center/average using a function, column names, indices or a logical vector. Default: All numeric variables. Note: cols is ignored if a two-sided formula is passed to by.

na.rm

logical. Skip missing values in x and w when computing averages. If na.rm = FALSE and a NA or NaN is encountered, the average for that group will be NA, and all data points belonging to that group in the output vector will also be NA.

effect

plm methods: Select which panel identifier should be used as grouping variable. 1L takes the first variable in the plm::index, 2L the second etc.. Index variables can also be called by name using a character string. If more than one variable is supplied, the corresponding index-factors are interacted.

stub

a prefix or stub to rename all transformed columns. FALSE will not rename columns.

fill

option to fbetween/B: Logical. TRUE will overwrite missing values in x with the respective average. By default missing values in x are preserved.

mean

option to fwithin/W: The mean to center on, default is 0, but a different mean can be supplied and will be added to the data after the centering is performed. A special option when performing grouped centering is mean = "overall.mean". In that case the overall mean of the data will be added after subtracting out group means.

theta

option to fwithin/W: Double. An optional scalar parameter for quasi-demeaning i.e. x - theta * xi.. This is useful for variance components ('random-effects') estimators. see Details.

keep.by, keep.ids, keep.group_vars

B and W data.frame, pdata.frame and grouped_df methods: Logical. Retain grouping / panel-identifier columns in the output. For data frames this only works if grouping variables were passed in a formula.

keep.w

B and W data.frame, pdata.frame and grouped_df methods: Logical. Retain column containing the weights in the output. Only works if w is passed as formula / lazy-expression.

...

arguments to be passed to or from other methods.

Details

Without groups, fbetween/B replaces all data points in x with their mean or weighted mean (if w is supplied). Similarly fwithin/W subtracts the (weighted) mean from all data points i.e. centers the data on the mean.

With groups supplied to g, the replacement / centering performed by fbetween/B | fwithin/W becomes groupwise. In terms of panel data notation: If x is a vector in such a panel dataset, xit denotes a single data-point belonging to group i in time-period t (t need not be a time-period). Then xi. denotes x, averaged over t. fbetween/B now returns xi. and fwithin/W returns x - xi.. Thus for any data x and any grouping vector g: B(x,g) + W(x,g) = xi. + x - xi. = x. In terms of variance, fbetween/B only retains the variance between group averages, while fwithin/W, by subtracting out group means, only retains the variance within those groups.

The data replacement performed by fbetween/B can keep (default) or overwrite missing values (option fill = TRUE) in x. fwithin/W can center data simply (default), or add back a mean after centering (option mean = value), or add the overall mean in groupwise computations (option mean = "overall.mean"). Let x.. denote the overall mean of x, then fwithin/W with mean = "overall.mean" returns x - xi. + x.. instead of x - xi.. This is useful to get rid of group-differences but preserve the overall level of the data. In regression analysis, centering with mean = "overall.mean" will only change the constant term. See Examples.

If theta != 1, fwithin/W performs quasi-demeaning x - theta * xi.. If mean = "overall.mean", x - theta * xi. + theta * x.. is returned, so that the mean of the partially demeaned data is still equal to the overall data mean x... A numeric value passed to mean will simply be added back to the quasi-demeaned data i.e. x - theta * xi. + mean.

Now in the case of a linear panel model \(y_{it} = \beta_0 + \beta_1 X_{it} + u_{it}\) with \(u_{it} = \alpha_i + \epsilon_{it}\). If \(\alpha_i \neq \alpha = const.\) (there exists individual heterogeneity), then pooled OLS is at least inefficient and inference on \(\beta_1\) is invalid. If \(E[\alpha_i|X_{it}] = 0\) (mean independence of individual heterogeneity \(\alpha_i\)), the variance components or 'random-effects' estimator provides an asymptotically efficient FGLS solution by estimating a transformed model \(y_{it}-\theta y_{i.} = \beta_0 + \beta_1 (X_{it} - \theta X_{i.}) + (u_{it} - \theta u_{i.}\)), where \(\theta = 1 - \frac{\sigma_\alpha}{\sqrt(\sigma^2_\alpha + T \sigma^2_\epsilon)}\). An estimate of \(\theta\) can be obtained from the an estimate of \(\hat{u}_{it}\) (the residuals from the pooled model). If \(E[\alpha_i|X_{it}] \neq 0\), pooled OLS is biased and inconsistent, and taking \(\theta = 1\) gives an unbiased and consistent fixed-effects estimator of \(\beta_1\). See Examples.

Value

fbetween/B returns x with every element replaced by its (groupwise) mean (xi.). Missing values are preserved if fill = FALSE (the default). fwithin/W returns x where every element was subtracted its (groupwise) mean (x - theta * xi. + mean or, if mean = "overall.mean", x - theta * xi. + theta * x..). See Details.

References

Mundlak, Yair. 1978. On the Pooling of Time Series and Cross Section Data. Econometrica 46 (1): 69-85.

See also

Examples

## Simple centering and averaging head(fbetween(mtcars))
#> mpg cyl disp hp drat wt qsec #> Mazda RX4 20.09062 6.1875 230.7219 146.6875 3.596562 3.21725 17.84875 #> Mazda RX4 Wag 20.09062 6.1875 230.7219 146.6875 3.596562 3.21725 17.84875 #> Datsun 710 20.09062 6.1875 230.7219 146.6875 3.596562 3.21725 17.84875 #> Hornet 4 Drive 20.09062 6.1875 230.7219 146.6875 3.596562 3.21725 17.84875 #> Hornet Sportabout 20.09062 6.1875 230.7219 146.6875 3.596562 3.21725 17.84875 #> Valiant 20.09062 6.1875 230.7219 146.6875 3.596562 3.21725 17.84875 #> vs am gear carb #> Mazda RX4 0.4375 0.40625 3.6875 2.8125 #> Mazda RX4 Wag 0.4375 0.40625 3.6875 2.8125 #> Datsun 710 0.4375 0.40625 3.6875 2.8125 #> Hornet 4 Drive 0.4375 0.40625 3.6875 2.8125 #> Hornet Sportabout 0.4375 0.40625 3.6875 2.8125 #> Valiant 0.4375 0.40625 3.6875 2.8125
head(B(mtcars))
#> B.mpg B.cyl B.disp B.hp B.drat B.wt B.qsec #> Mazda RX4 20.09062 6.1875 230.7219 146.6875 3.596562 3.21725 17.84875 #> Mazda RX4 Wag 20.09062 6.1875 230.7219 146.6875 3.596562 3.21725 17.84875 #> Datsun 710 20.09062 6.1875 230.7219 146.6875 3.596562 3.21725 17.84875 #> Hornet 4 Drive 20.09062 6.1875 230.7219 146.6875 3.596562 3.21725 17.84875 #> Hornet Sportabout 20.09062 6.1875 230.7219 146.6875 3.596562 3.21725 17.84875 #> Valiant 20.09062 6.1875 230.7219 146.6875 3.596562 3.21725 17.84875 #> B.vs B.am B.gear B.carb #> Mazda RX4 0.4375 0.40625 3.6875 2.8125 #> Mazda RX4 Wag 0.4375 0.40625 3.6875 2.8125 #> Datsun 710 0.4375 0.40625 3.6875 2.8125 #> Hornet 4 Drive 0.4375 0.40625 3.6875 2.8125 #> Hornet Sportabout 0.4375 0.40625 3.6875 2.8125 #> Valiant 0.4375 0.40625 3.6875 2.8125
head(fwithin(mtcars))
#> mpg cyl disp hp drat wt #> Mazda RX4 0.909375 -0.1875 -70.721875 -36.6875 0.3034375 -0.59725 #> Mazda RX4 Wag 0.909375 -0.1875 -70.721875 -36.6875 0.3034375 -0.34225 #> Datsun 710 2.709375 -2.1875 -122.721875 -53.6875 0.2534375 -0.89725 #> Hornet 4 Drive 1.309375 -0.1875 27.278125 -36.6875 -0.5165625 -0.00225 #> Hornet Sportabout -1.390625 1.8125 129.278125 28.3125 -0.4465625 0.22275 #> Valiant -1.990625 -0.1875 -5.721875 -41.6875 -0.8365625 0.24275 #> qsec vs am gear carb #> Mazda RX4 -1.38875 -0.4375 0.59375 0.3125 1.1875 #> Mazda RX4 Wag -0.82875 -0.4375 0.59375 0.3125 1.1875 #> Datsun 710 0.76125 0.5625 0.59375 0.3125 -1.8125 #> Hornet 4 Drive 1.59125 0.5625 -0.40625 -0.6875 -1.8125 #> Hornet Sportabout -0.82875 -0.4375 -0.40625 -0.6875 -0.8125 #> Valiant 2.37125 0.5625 -0.40625 -0.6875 -1.8125
head(W(mtcars))
#> W.mpg W.cyl W.disp W.hp W.drat W.wt #> Mazda RX4 0.909375 -0.1875 -70.721875 -36.6875 0.3034375 -0.59725 #> Mazda RX4 Wag 0.909375 -0.1875 -70.721875 -36.6875 0.3034375 -0.34225 #> Datsun 710 2.709375 -2.1875 -122.721875 -53.6875 0.2534375 -0.89725 #> Hornet 4 Drive 1.309375 -0.1875 27.278125 -36.6875 -0.5165625 -0.00225 #> Hornet Sportabout -1.390625 1.8125 129.278125 28.3125 -0.4465625 0.22275 #> Valiant -1.990625 -0.1875 -5.721875 -41.6875 -0.8365625 0.24275 #> W.qsec W.vs W.am W.gear W.carb #> Mazda RX4 -1.38875 -0.4375 0.59375 0.3125 1.1875 #> Mazda RX4 Wag -0.82875 -0.4375 0.59375 0.3125 1.1875 #> Datsun 710 0.76125 0.5625 0.59375 0.3125 -1.8125 #> Hornet 4 Drive 1.59125 0.5625 -0.40625 -0.6875 -1.8125 #> Hornet Sportabout -0.82875 -0.4375 -0.40625 -0.6875 -0.8125 #> Valiant 2.37125 0.5625 -0.40625 -0.6875 -1.8125
all.equal(fbetween(mtcars) + fwithin(mtcars), mtcars)
#> [1] TRUE
## Groupwise centering and averaging head(fbetween(mtcars, mtcars$cyl))
#> mpg cyl disp hp drat wt qsec #> Mazda RX4 19.74286 6 183.3143 122.28571 3.585714 3.117143 17.97714 #> Mazda RX4 Wag 19.74286 6 183.3143 122.28571 3.585714 3.117143 17.97714 #> Datsun 710 26.66364 4 105.1364 82.63636 4.070909 2.285727 19.13727 #> Hornet 4 Drive 19.74286 6 183.3143 122.28571 3.585714 3.117143 17.97714 #> Hornet Sportabout 15.10000 8 353.1000 209.21429 3.229286 3.999214 16.77214 #> Valiant 19.74286 6 183.3143 122.28571 3.585714 3.117143 17.97714 #> vs am gear carb #> Mazda RX4 0.5714286 0.4285714 3.857143 3.428571 #> Mazda RX4 Wag 0.5714286 0.4285714 3.857143 3.428571 #> Datsun 710 0.9090909 0.7272727 4.090909 1.545455 #> Hornet 4 Drive 0.5714286 0.4285714 3.857143 3.428571 #> Hornet Sportabout 0.0000000 0.1428571 3.285714 3.500000 #> Valiant 0.5714286 0.4285714 3.857143 3.428571
head(fwithin(mtcars, mtcars$cyl))
#> mpg cyl disp hp drat wt #> Mazda RX4 1.257143 0 -23.314286 -12.28571 0.31428571 -0.49714286 #> Mazda RX4 Wag 1.257143 0 -23.314286 -12.28571 0.31428571 -0.24214286 #> Datsun 710 -3.863636 0 2.863636 10.36364 -0.22090909 0.03427273 #> Hornet 4 Drive 1.657143 0 74.685714 -12.28571 -0.50571429 0.09785714 #> Hornet Sportabout 3.600000 0 6.900000 -34.21429 -0.07928571 -0.55921429 #> Valiant -1.642857 0 41.685714 -17.28571 -0.82571429 0.34285714 #> qsec vs am gear carb #> Mazda RX4 -1.5171429 -0.57142857 0.5714286 0.14285714 0.5714286 #> Mazda RX4 Wag -0.9571429 -0.57142857 0.5714286 0.14285714 0.5714286 #> Datsun 710 -0.5272727 0.09090909 0.2727273 -0.09090909 -0.5454545 #> Hornet 4 Drive 1.4628571 0.42857143 -0.4285714 -0.85714286 -2.4285714 #> Hornet Sportabout 0.2478571 0.00000000 -0.1428571 -0.28571429 -1.5000000 #> Valiant 2.2428571 0.42857143 -0.4285714 -0.85714286 -2.4285714
all.equal(fbetween(mtcars, mtcars$cyl) + fwithin(mtcars, mtcars$cyl), mtcars)
#> [1] TRUE
head(W(wlddev, ~ iso3c, cols = 9:12)) # Center the 4 series in this dataset by country
#> iso3c W.PCGDP W.LIFEEX W.GINI W.ODA #> 1 AFG NA -15.59016 NA -1236633448 #> 2 AFG NA -15.14016 NA -1117723448 #> 3 AFG NA -14.69716 NA -1236193448 #> 4 AFG NA -14.25816 NA -1114623448 #> 5 AFG NA -13.82216 NA -1048593448 #> 6 AFG NA -13.38716 NA -980823448
head(cbind(get_vars(wlddev,"iso3c"), # Same thing done manually using fwithin.. add_stub(fwithin(get_vars(wlddev,9:12), wlddev$iso3c), "W.")))
#> iso3c W.PCGDP W.LIFEEX W.GINI W.ODA #> 1 AFG NA -15.59016 NA -1236633448 #> 2 AFG NA -15.14016 NA -1117723448 #> 3 AFG NA -14.69716 NA -1236193448 #> 4 AFG NA -14.25816 NA -1114623448 #> 5 AFG NA -13.82216 NA -1048593448 #> 6 AFG NA -13.38716 NA -980823448
## Using B() and W() for fixed-effects regressions: # Several ways of running the same regression with cyl-fixed effects lm(W(mpg,cyl) ~ W(carb,cyl), data = mtcars) # Centering each individually
#> #> Call: #> lm(formula = W(mpg, cyl) ~ W(carb, cyl), data = mtcars) #> #> Coefficients: #> (Intercept) W(carb, cyl) #> -2.592e-16 -4.655e-01 #>
lm(mpg ~ carb, data = W(mtcars, ~ cyl, stub = FALSE)) # Centering the entire data
#> #> Call: #> lm(formula = mpg ~ carb, data = W(mtcars, ~cyl, stub = FALSE)) #> #> Coefficients: #> (Intercept) carb #> -2.592e-16 -4.655e-01 #>
lm(mpg ~ carb, data = W(mtcars, ~ cyl, stub = FALSE, # Here only the intercept changes mean = "overall.mean"))
#> #> Call: #> lm(formula = mpg ~ carb, data = W(mtcars, ~cyl, stub = FALSE, #> mean = "overall.mean")) #> #> Coefficients: #> (Intercept) carb #> 21.3999 -0.4655 #>
lm(mpg ~ carb + B(carb,cyl), data = mtcars) # Procedure suggested by
#> #> Call: #> lm(formula = mpg ~ carb + B(carb, cyl), data = mtcars) #> #> Coefficients: #> (Intercept) carb B(carb, cyl) #> 34.8297 -0.4655 -4.7750 #>
# ..Mundlak (1978) - partialling out group averages amounts to the same as demeaning the data plm::plm(mpg ~ carb, mtcars, index = "cyl", model = "within") # Proof..
#> #> Model Formula: mpg ~ carb #> <environment: 0x0000004d31c24620> #> #> Coefficients: #> carb #> -0.46551 #>
# This takes the interaction of cyl, vs and am as fixed effects lm(W(mpg,list(cyl,vs,am)) ~ W(carb,list(cyl,vs,am)), data = mtcars)
#> #> Call: #> lm(formula = W(mpg, list(cyl, vs, am)) ~ W(carb, list(cyl, vs, #> am)), data = mtcars) #> #> Coefficients: #> (Intercept) W(carb, list(cyl, vs, am)) #> -1.346e-15 -9.413e-01 #>
lm(mpg ~ carb, data = W(mtcars, ~ cyl + vs + am, stub = FALSE))
#> #> Call: #> lm(formula = mpg ~ carb, data = W(mtcars, ~cyl + vs + am, stub = FALSE)) #> #> Coefficients: #> (Intercept) carb #> -1.346e-15 -9.413e-01 #>
lm(mpg ~ carb + B(carb,list(cyl,vs,am)), data = mtcars)
#> #> Call: #> lm(formula = mpg ~ carb + B(carb, list(cyl, vs, am)), data = mtcars) #> #> Coefficients: #> (Intercept) carb #> 27.8168 -0.9413 #> B(carb, list(cyl, vs, am)) #> -1.8057 #>
# Now with cyl fixed effects weighted by hp: lm(W(mpg,cyl,hp) ~ W(carb,cyl,hp), data = mtcars)
#> #> Call: #> lm(formula = W(mpg, cyl, hp) ~ W(carb, cyl, hp), data = mtcars) #> #> Coefficients: #> (Intercept) W(carb, cyl, hp) #> 0.1747 -0.4469 #>
lm(mpg ~ carb, data = W(mtcars, ~ cyl, ~ hp, stub = FALSE))
#> #> Call: #> lm(formula = mpg ~ carb, data = W(mtcars, ~cyl, ~hp, stub = FALSE)) #> #> Coefficients: #> (Intercept) carb #> 0.1747 -0.4469 #>
lm(mpg ~ carb + B(carb,cyl,hp), data = mtcars) # WRONG ! Gives a different coefficient!!
#> #> Call: #> lm(formula = mpg ~ carb + B(carb, cyl, hp), data = mtcars) #> #> Coefficients: #> (Intercept) carb B(carb, cyl, hp) #> 34.1833 -0.4383 -4.2638 #>
## Manual variance components (random-effects) estimation res <- HDW(mtcars, mpg ~ carb)[[1]] # Get residuals from pooled OLS sig2_u <- fvar(res) sig2_e <- fvar(fwithin(res, mtcars$cyl)) T <- length(res) / fNdistinct(mtcars$cyl) sig2_alpha <- sig2_u - sig2_e theta <- 1 - sqrt(sig2_alpha) / sqrt(sig2_alpha + T * sig2_e) lm(mpg ~ carb, data = W(mtcars, ~ cyl, theta = theta, mean = "overall.mean", stub = FALSE))
#> #> Call: #> lm(formula = mpg ~ carb, data = W(mtcars, ~cyl, theta = theta, #> mean = "overall.mean", stub = FALSE)) #> #> Coefficients: #> (Intercept) carb #> 21.8727 -0.6336 #>
# A slightly different method to obtain theta... plm::plm(mpg ~ carb, mtcars, index = "cyl", model = "random")
#> #> Model Formula: mpg ~ carb #> <environment: 0x0000004d31c24620> #> #> Coefficients: #> (Intercept) carb #> 22.40631 -0.68522 #>