fFtest computes an R-squared based F-test for the exclusion of the variables in exc, where the full (unrestricted) model is defined by variables supplied to both exc and X. The test is efficient and designed for cases where both exc and X may contain multiple factors and continuous variables.

fFtest(y, exc, X = NULL, w = NULL, full.df = TRUE, ...)

Arguments

y

a numeric vector: The dependent variable.

exc

a numeric vector, factor, numeric matrix or list / data frame of numeric vectors and/or factors: Variables to test / exclude.

X

a numeric vector, factor, numeric matrix or list / data frame of numeric vectors and/or factors: Covariates to include in both the restricted (without exc) and unrestricted model. If left empty (X = NULL), the test amounts to the F-test of the regression of y on exc.

w

numeric. A vector of (frequency) weights.

full.df

logical. If TRUE (default), the degrees of freedom are calculated as if both restricted and unrestricted models were estimated using lm() (i.e. as if factors were expanded to matrices of dummies). FALSE only uses one degree of freedom per factor.

...

other arguments passed to fHDwithin. Sensible options might be the lm.method argument or further control parameters to fixest::demean, the workhorse function underlying fHDwithin for higher-order centering tasks.

Details

Factors and continuous regressors are efficiently projected out using fHDwithin, and the option full.df regulates whether a degree of freedom is subtracted for each used factor level (equivalent to dummy-variable estimator / expanding factors), or only one degree of freedom per factor (treating factors as variables). The test automatically removes missing values and considers only the complete cases of y, exc and X. Unused factor levels in exc and X are dropped.

Note that an intercept is always added by fHDwithin, so it is not necessary to include an intercept in data supplied to exc / X.

Value

A 5 x 3 numeric matrix of statistics. The columns contain statistics:

  1. the R-squared of the model

  2. the numerator degrees of freedom i.e. the number of variables (k) and used factor levels if full.df = TRUE

  3. the denominator degrees of freedom: N - k - 1.

  4. the F-statistic

  5. the corresponding P-value

The rows show these statistics for:

  1. the Full (unrestricted) Model (y ~ exc + X)

  2. the Restricted Model (y ~ X)

  3. the Exclusion Restriction of exc. The R-squared shown is simply the difference of the full and restricted R-Squared's, not the R-Squared of the model y ~ exc.

If X = NULL, only a vector of the same 5 statistics testing the model (y ~ exc) is shown.

See also

Examples

## We could use fFtest as a seasonality test: fFtest(AirPassengers, qF(cycle(AirPassengers))) # Testing for level-seasonality
#> R-Sq. DF1 DF2 F-Stat. P-value #> 0.106 11 132 1.424 0.169
fFtest(AirPassengers, qF(cycle(AirPassengers)), # Seasonality test around a cubic trend poly(seq_along(AirPassengers), 3))
#> R-Sq. DF1 DF2 F-Stat. P-Value #> Full Model 0.965 14 129 250.585 0.000 #> Restricted Model 0.862 3 140 291.593 0.000 #> Exclusion Rest. 0.102 11 129 33.890 0.000
fFtest(fdiff(AirPassengers), qF(cycle(AirPassengers))) # Seasonality in first-difference
#> R-Sq. DF1 DF2 F-Stat. P-value #> 0.749 11 131 35.487 0.000
## A more classical example with only continuous variables fFtest(mtcars$mpg, mtcars[c("cyl","vs")], mtcars[c("hp","carb")])
#> R-Sq. DF1 DF2 F-Stat. P-Value #> Full Model 0.750 4 27 20.261 0.000 #> Restricted Model 0.605 2 29 22.175 0.000 #> Exclusion Rest. 0.145 2 27 7.858 0.002
## Now encoding cyl and vs as factors fFtest(mtcars$mpg, dapply(mtcars[c("cyl","vs")], qF), mtcars[c("hp","carb")])
#> R-Sq. DF1 DF2 F-Stat. P-Value #> Full Model 0.756 5 26 16.140 0.000 #> Restricted Model 0.605 2 29 22.175 0.000 #> Exclusion Rest. 0.152 3 26 5.395 0.005
## Using iris data: A factor and a continuous variable excluded fFtest(iris$Sepal.Length, iris[4:5], iris[2:3])
#> R-Sq. DF1 DF2 F-Stat. P-Value #> Full Model 0.867 5 144 188.251 0.000 #> Restricted Model 0.840 2 147 386.386 0.000 #> Exclusion Rest. 0.027 3 144 9.816 0.000
## Testing the significance of country-FE in regression of GDP on life expectancy fFtest(wlddev$PCGDP, wlddev$iso3c, wlddev$LIFEEX)
#> R-Sq. DF1 DF2 F-Stat. P-Value #> Full Model 0.877 197 8200 296.420 0.000 #> Restricted Model 0.325 1 8396 4040.681 0.000 #> Exclusion Rest. 0.552 196 8200 187.541 0.000
## Ok, country-FE are significant, what about adding time-FE fFtest(wlddev$PCGDP, qF(wlddev$year), wlddev[c("iso3c","LIFEEX")])
#> R-Sq. DF1 DF2 F-Stat. P-Value #> Full Model 0.896 253 8144 277.440 0.000 #> Restricted Model 0.877 197 8200 296.420 0.000 #> Exclusion Rest. 0.019 56 8144 26.817 0.000
# Same test done using lm: data <- na_omit(get_vars(wlddev, c("iso3c","year","PCGDP","LIFEEX"))) full <- lm(PCGDP ~ LIFEEX + iso3c + qF(year), data) rest <- lm(PCGDP ~ LIFEEX + iso3c, data) anova(rest, full)
#> Analysis of Variance Table #> #> Model 1: PCGDP ~ LIFEEX + iso3c #> Model 2: PCGDP ~ LIFEEX + iso3c + qF(year) #> Res.Df RSS Df Sum of Sq F Pr(>F) #> 1 8200 2.5036e+11 #> 2 8144 2.1138e+11 56 3.8979e+10 26.817 < 2.2e-16 *** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1