flag is an S3 generic to compute (sequences of) lags and leads. L and F are wrappers around flag representing the lag- and lead-operators, such that L(x,-1) = F(x,1) = F(x) and L(x,-3:3) = F(x,3:-3). L and F provide more flexibility than flag when applied to data frames (i.e. column subsetting, formula input and id-variable-preservation capabilities...), but are otherwise identical.

(flag is more of a programmers function in style of the Fast Statistical Functions while L and F are more practical to use in regression formulas or for computations on data frames.)

flag(x, n = 1, ...)
   L(x, n = 1, ...)
   F(x, n = 1, ...)

# S3 method for default
flag(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = TRUE, ...)
# S3 method for default
L(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = TRUE, ...)
# S3 method for default
F(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = TRUE, ...)

# S3 method for matrix
flag(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = length(n) > 1L, ...)
# S3 method for matrix
L(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = TRUE, ...)
# S3 method for matrix
F(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = TRUE, ...)

# S3 method for data.frame
flag(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = length(n) > 1L, ...)
# S3 method for data.frame
L(x, n = 1, by = NULL, t = NULL, cols = is.numeric,
  fill = NA, stubs = TRUE, keep.ids = TRUE, ...)
# S3 method for data.frame
F(x, n = 1, by = NULL, t = NULL, cols = is.numeric,
  fill = NA, stubs = TRUE, keep.ids = TRUE, ...)

# Methods for compatibility with plm:

# S3 method for pseries
flag(x, n = 1, fill = NA, stubs = TRUE, ...)
# S3 method for pseries
L(x, n = 1, fill = NA, stubs = TRUE, ...)
# S3 method for pseries
F(x, n = 1, fill = NA, stubs = TRUE, ...)

# S3 method for pdata.frame
flag(x, n = 1, fill = NA, stubs = length(n) > 1L, ...)
# S3 method for pdata.frame
L(x, n = 1, cols = is.numeric, fill = NA, stubs = TRUE,
  keep.ids = TRUE, ...)
# S3 method for pdata.frame
F(x, n = 1, cols = is.numeric, fill = NA, stubs = TRUE,
  keep.ids = TRUE, ...)

# Methods for grouped data frame / compatibility with dplyr:

# S3 method for grouped_df
flag(x, n = 1, t = NULL, fill = NA, stubs = length(n) > 1L, keep.ids = TRUE, ...)
# S3 method for grouped_df
L(x, n = 1, t = NULL, fill = NA, stubs = TRUE, keep.ids = TRUE, ...)
# S3 method for grouped_df
F(x, n = 1, t = NULL, fill = NA, stubs = TRUE, keep.ids = TRUE, ...)

Arguments

x

a vector / time series, (time series) matrix, data frame, panel series (plm::pseries), panel data frame (plm::pdata.frame) or grouped data frame (class 'grouped_df'). Data must not be numeric i.e you can also lag a date variable, character data etc...

n

integer. A vector indicating the lags / leads to compute (passing negative integers to flag or L computes leads, passing negative integers to F computes lags).

g

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x.

by

data.frame method: Same as g, but also allows one- or two-sided formulas i.e. ~ group1 or var1 + var2 ~ group1 + group2. See Examples.

t

same input as g/by, to indicate the time-variable(s). For safe computation of differences on unordered time series and panels. Data Frame method also allows one-sided formula i.e. ~time. grouped_df method supports lazy-evaluation i.e. time (no quotes).

cols

data.frame method: Select columns to difference using a function, column names, indices or a logical vector. Default: All numeric variables. Note: cols is ignored if a two-sided formula is passed to by.

fill

value to insert when vectors are shifted. Default is NA.

stubs

logical. TRUE will rename all lagged / leaded columns by adding a stub or prefix "Ln." / "Fn.".

keep.ids

data.frame / pdata.frame / grouped_df methods: Logical. Drop all panel-identifiers from the output (which includes all variables passed to by or t). Note: For grouped / panel data frames identifiers are dropped, but the 'groups' / 'index' attributes are kept.

...

arguments to be passed to or from other methods.

Details

If a single integer is passed to n, and g/by and t are left empty, flag/L/F just returns x with all columns lagged / leaded by n. If length(n)>1, and x is an atomic vector (time series), flag/L/F returns a (time series) matrix with lags / leads computed in the same order as passed to n. If instead x is a matrix / data frame, a matrix / data frame with ncol(x)*length(n) columns is returned where columns are sorted first by variable and then by lag (so all lags computed on a variable are grouped together). x can be of any standard data type.

With groups/panel-identifiers supplied to g/by, flag/L/F efficiently computes a panel-lag/lead by shifting the entire vector(s) but inserting fill elements in the right places. If t is left empty, the data needs to be ordered such that all values belonging to a group are consecutive and in the right order. It is not necessary that the groups themselves occur in the right order. If a time-variable is supplied to t (or a list of time-variables uniquely identifying the time-dimension), the panel is fully identified and lags / leads can be securely computed even if the data is unordered.

It is also possible to lag unordered or irregular time series utilizing only the t argument to identify the temporal dimension of the data.

Since v1.5.0 flag/L/F provide full built-in support for irregular time series and unbalanced panels. The suggested workaround using the seqid function is therefore no longer necessary.

Computationally, if both g/by and t are supplied, flag/L/F uses two initial passes to create an ordering through which the data are accessed. First-pass: Calculate minimum and maximum time-value for each individual. Second-pass: Generate the ordering by placing the current element index into the vector slot obtained by adding the cumulative group size and the current time-value subtracted its individual-minimum together. This method of computation is faster than any sort-based method and delivers optimal performance if the panel-id supplied to g/by is already a factor variable, and if t is either an integer or factor variable. If g/by is not factor or t is not factor or integer, qG or GRP will be called to group the respective identifier and this can be expensive, so for optimal performance prepare the data (or use plm classes).

The methods applying to plm objects (panel series and panel data frames) automatically utilize the factor panel-identifiers attached to these objects and thus securely and efficiently compute fully identified panel-lags. If these objects have > 2 panel-identifiers attached to them, the last identifier is assumed to be the time-variable, and the others are taken as grouping-variables and interacted. Note that flag/L/F is significantly faster than plm::lag/plm::lead since the latter is written in R and based on a Split-Apply-Combine logic.

Value

x lagged / leaded n-times, grouped by g/by, ordered by t. See Details and Examples.

See also

Examples

## Simple Time Series: AirPassengers L(AirPassengers) # 1 lag
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec #> 1949 NA 112 118 132 129 121 135 148 148 136 119 104 #> 1950 118 115 126 141 135 125 149 170 170 158 133 114 #> 1951 140 145 150 178 163 172 178 199 199 184 162 146 #> 1952 166 171 180 193 181 183 218 230 242 209 191 172 #> 1953 194 196 196 236 235 229 243 264 272 237 211 180 #> 1954 201 204 188 235 227 234 264 302 293 259 229 203 #> 1955 229 242 233 267 269 270 315 364 347 312 274 237 #> 1956 278 284 277 317 313 318 374 413 405 355 306 271 #> 1957 306 315 301 356 348 355 422 465 467 404 347 305 #> 1958 336 340 318 362 348 363 435 491 505 404 359 310 #> 1959 337 360 342 406 396 420 472 548 559 463 407 362 #> 1960 405 417 391 419 461 472 535 622 606 508 461 390
F(AirPassengers) # 1 lead
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec #> 1949 118 132 129 121 135 148 148 136 119 104 118 115 #> 1950 126 141 135 125 149 170 170 158 133 114 140 145 #> 1951 150 178 163 172 178 199 199 184 162 146 166 171 #> 1952 180 193 181 183 218 230 242 209 191 172 194 196 #> 1953 196 236 235 229 243 264 272 237 211 180 201 204 #> 1954 188 235 227 234 264 302 293 259 229 203 229 242 #> 1955 233 267 269 270 315 364 347 312 274 237 278 284 #> 1956 277 317 313 318 374 413 405 355 306 271 306 315 #> 1957 301 356 348 355 422 465 467 404 347 305 336 340 #> 1958 318 362 348 363 435 491 505 404 359 310 337 360 #> 1959 342 406 396 420 472 548 559 463 407 362 405 417 #> 1960 391 419 461 472 535 622 606 508 461 390 432 NA
all_identical(L(AirPassengers), # 3 identical ways of computing 1 lag flag(AirPassengers), F(AirPassengers, -1))
#> [1] TRUE
head(L(AirPassengers, -1:3)) # 1 lead and 3 lags - output as matrix
#> F1 -- L1 L2 L3 #> [1,] 118 112 NA NA NA #> [2,] 132 118 112 NA NA #> [3,] 129 132 118 112 NA #> [4,] 121 129 132 118 112 #> [5,] 135 121 129 132 118 #> [6,] 148 135 121 129 132
## Time Series Matrix of 4 EU Stock Market Indicators, 1991-1998 tsp(EuStockMarkets) # Data is recorded on 260 days per year
#> [1] 1991.496 1998.646 260.000
freq <- frequency(EuStockMarkets) plot(stl(EuStockMarkets[,"DAX"], freq)) # There is some obvious seasonality
head(L(EuStockMarkets, -1:3 * freq)) # 1 annual lead and 3 annual lags
#> F260.DAX DAX L260.DAX L520.DAX L780.DAX F260.SMI SMI L260.SMI #> [1,] 1755.98 1628.75 NA NA NA 1846.6 1678.1 NA #> [2,] 1754.95 1613.63 NA NA NA 1854.8 1688.5 NA #> [3,] 1759.90 1606.51 NA NA NA 1845.3 1678.6 NA #> [4,] 1759.84 1621.04 NA NA NA 1854.5 1684.1 NA #> [5,] 1776.50 1618.16 NA NA NA 1870.5 1686.6 NA #> [6,] 1769.98 1610.61 NA NA NA 1862.6 1671.6 NA #> L520.SMI L780.SMI F260.CAC CAC L260.CAC L520.CAC L780.CAC F260.FTSE #> [1,] NA NA 1907.3 1772.8 NA NA NA 2515.8 #> [2,] NA NA 1900.6 1750.5 NA NA NA 2521.2 #> [3,] NA NA 1880.9 1718.0 NA NA NA 2493.9 #> [4,] NA NA 1873.5 1708.1 NA NA NA 2476.1 #> [5,] NA NA 1883.6 1723.1 NA NA NA 2497.1 #> [6,] NA NA 1868.5 1714.3 NA NA NA 2469.0 #> FTSE L260.FTSE L520.FTSE L780.FTSE #> [1,] 2443.6 NA NA NA #> [2,] 2460.2 NA NA NA #> [3,] 2448.2 NA NA NA #> [4,] 2470.4 NA NA NA #> [5,] 2484.7 NA NA NA #> [6,] 2466.8 NA NA NA
summary(lm(DAX ~., data = L(EuStockMarkets,-1:3*freq))) # DAX regressed on it's own annual lead,
#> #> Call: #> lm(formula = DAX ~ ., data = L(EuStockMarkets, -1:3 * freq)) #> #> Residuals: #> Min 1Q Median 3Q Max #> -158.092 -30.174 1.355 28.741 211.844 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -1.030e+03 1.016e+02 -10.141 < 2e-16 *** #> F260.DAX 1.037e-01 2.621e-02 3.957 8.25e-05 *** #> L260.DAX -3.544e-01 4.394e-02 -8.066 2.65e-15 *** #> L520.DAX -2.232e-01 4.116e-02 -5.423 7.75e-08 *** #> L780.DAX 9.451e-02 4.484e-02 2.107 0.035391 * #> F260.SMI 4.968e-02 1.554e-02 3.198 0.001441 ** #> SMI 2.616e-01 2.301e-02 11.366 < 2e-16 *** #> L260.SMI 6.138e-02 2.740e-02 2.240 0.025342 * #> L520.SMI -2.153e-01 2.707e-02 -7.954 6.15e-15 *** #> L780.SMI -2.208e-01 3.091e-02 -7.144 2.04e-12 *** #> F260.CAC -1.392e-01 3.583e-02 -3.884 0.000111 *** #> CAC 7.165e-01 3.189e-02 22.470 < 2e-16 *** #> L260.CAC -5.482e-02 3.874e-02 -1.415 0.157455 #> L520.CAC 2.326e-01 4.570e-02 5.090 4.46e-07 *** #> L780.CAC -7.636e-02 3.966e-02 -1.925 0.054583 . #> F260.FTSE -9.505e-02 2.174e-02 -4.372 1.39e-05 *** #> FTSE 3.158e-01 3.103e-02 10.176 < 2e-16 *** #> L260.FTSE 1.745e-01 3.104e-02 5.621 2.63e-08 *** #> L520.FTSE 2.809e-01 3.381e-02 8.308 4.14e-16 *** #> L780.FTSE 1.535e-01 3.014e-02 5.092 4.42e-07 *** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 49.64 on 800 degrees of freedom #> (1040 observations deleted due to missingness) #> Multiple R-squared: 0.9926, Adjusted R-squared: 0.9925 #> F-statistic: 5668 on 19 and 800 DF, p-value: < 2.2e-16 #>
# lags and the lead/lags of the other series ## World Development Panel Data head(flag(wlddev, 1, wlddev$iso3c, wlddev$year)) # This lags all variables,
#> country iso3c date year decade region income OECD PCGDP #> 1 <NA> <NA> <NA> NA NA <NA> <NA> NA NA #> 2 Afghanistan AFG 1961-01-01 1960 1960 South Asia Low income FALSE NA #> 3 Afghanistan AFG 1962-01-01 1961 1960 South Asia Low income FALSE NA #> 4 Afghanistan AFG 1963-01-01 1962 1960 South Asia Low income FALSE NA #> 5 Afghanistan AFG 1964-01-01 1963 1960 South Asia Low income FALSE NA #> 6 Afghanistan AFG 1965-01-01 1964 1960 South Asia Low income FALSE NA #> LIFEEX GINI ODA #> 1 NA NA NA #> 2 32.292 NA 114440000 #> 3 32.742 NA 233350000 #> 4 33.185 NA 114880000 #> 5 33.624 NA 236450000 #> 6 34.060 NA 302480000
head(L(wlddev, 1, ~iso3c, ~year)) # This lags all numeric variables
#> iso3c year L1.decade L1.PCGDP L1.LIFEEX L1.GINI L1.ODA #> 1 AFG 1960 NA NA NA NA NA #> 2 AFG 1961 1960 NA 32.292 NA 114440000 #> 3 AFG 1962 1960 NA 32.742 NA 233350000 #> 4 AFG 1963 1960 NA 33.185 NA 114880000 #> 5 AFG 1964 1960 NA 33.624 NA 236450000 #> 6 AFG 1965 1960 NA 34.060 NA 302480000
head(L(wlddev, 1, ~iso3c)) # Without t: Works because data is ordered
#> Panel-lag computed without timevar: Assuming ordered data
#> iso3c L1.year L1.decade L1.PCGDP L1.LIFEEX L1.GINI L1.ODA #> 1 AFG NA NA NA NA NA NA #> 2 AFG 1960 1960 NA 32.292 NA 114440000 #> 3 AFG 1961 1960 NA 32.742 NA 233350000 #> 4 AFG 1962 1960 NA 33.185 NA 114880000 #> 5 AFG 1963 1960 NA 33.624 NA 236450000 #> 6 AFG 1964 1960 NA 34.060 NA 302480000
head(L(wlddev, 1, PCGDP + LIFEEX ~ iso3c, ~year)) # This lags GDP per Capita & Life Expectancy
#> iso3c year L1.PCGDP L1.LIFEEX #> 1 AFG 1960 NA NA #> 2 AFG 1961 NA 32.292 #> 3 AFG 1962 NA 32.742 #> 4 AFG 1963 NA 33.185 #> 5 AFG 1964 NA 33.624 #> 6 AFG 1965 NA 34.060
head(L(wlddev, 0:2, ~ iso3c, ~year, cols = 9:10)) # Same, also retaining original series
#> iso3c year PCGDP L1.PCGDP L2.PCGDP LIFEEX L1.LIFEEX L2.LIFEEX #> 1 AFG 1960 NA NA NA 32.292 NA NA #> 2 AFG 1961 NA NA NA 32.742 32.292 NA #> 3 AFG 1962 NA NA NA 33.185 32.742 32.292 #> 4 AFG 1963 NA NA NA 33.624 33.185 32.742 #> 5 AFG 1964 NA NA NA 34.060 33.624 33.185 #> 6 AFG 1965 NA NA NA 34.495 34.060 33.624
head(L(wlddev, 1:2, PCGDP + LIFEEX ~ iso3c, ~year, # Two lags, dropping id columns keep.ids = FALSE))
#> L1.PCGDP L2.PCGDP L1.LIFEEX L2.LIFEEX #> 1 NA NA NA NA #> 2 NA NA 32.292 NA #> 3 NA NA 32.742 32.292 #> 4 NA NA 33.185 32.742 #> 5 NA NA 33.624 33.185 #> 6 NA NA 34.060 33.624
# Different ways of regressing GDP on its's lags and life-Expectancy and it's lags summary(lm(PCGDP ~ ., L(wlddev, 0:2, ~iso3c, ~year, 9:10, keep.ids = FALSE))) # 1 - Precomputing
#> #> Call: #> lm(formula = PCGDP ~ ., data = L(wlddev, 0:2, ~iso3c, ~year, #> 9:10, keep.ids = FALSE)) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16621.0 -100.0 -17.2 86.2 11935.3 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -321.51378 63.37246 -5.073 4e-07 *** #> L1.PCGDP 1.31801 0.01061 124.173 <2e-16 *** #> L2.PCGDP -0.31550 0.01070 -29.483 <2e-16 *** #> LIFEEX -1.93638 38.24878 -0.051 0.960 #> L1.LIFEEX 10.01163 71.20359 0.141 0.888 #> L2.LIFEEX -1.66669 37.70885 -0.044 0.965 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 791.3 on 7988 degrees of freedom #> (4750 observations deleted due to missingness) #> Multiple R-squared: 0.9974, Adjusted R-squared: 0.9974 #> F-statistic: 6.166e+05 on 5 and 7988 DF, p-value: < 2.2e-16 #>
summary(lm(PCGDP ~ L(PCGDP,1:2,iso3c,year) + L(LIFEEX,0:2,iso3c,year), wlddev)) # 2 - Ad-hoc
#> #> Call: #> lm(formula = PCGDP ~ L(PCGDP, 1:2, iso3c, year) + L(LIFEEX, 0:2, #> iso3c, year), data = wlddev) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16621.0 -100.0 -17.2 86.2 11935.3 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -321.51378 63.37246 -5.073 4e-07 *** #> L(PCGDP, 1:2, iso3c, year)L1 1.31801 0.01061 124.173 <2e-16 *** #> L(PCGDP, 1:2, iso3c, year)L2 -0.31550 0.01070 -29.483 <2e-16 *** #> L(LIFEEX, 0:2, iso3c, year)-- -1.93638 38.24878 -0.051 0.960 #> L(LIFEEX, 0:2, iso3c, year)L1 10.01163 71.20359 0.141 0.888 #> L(LIFEEX, 0:2, iso3c, year)L2 -1.66669 37.70885 -0.044 0.965 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 791.3 on 7988 degrees of freedom #> (4750 observations deleted due to missingness) #> Multiple R-squared: 0.9974, Adjusted R-squared: 0.9974 #> F-statistic: 6.166e+05 on 5 and 7988 DF, p-value: < 2.2e-16 #>
summary(lm(PCGDP ~ L(PCGDP,1:2,iso3c) + L(LIFEEX,0:2,iso3c), wlddev)) # 3 - same no year
#> Panel-lag computed without timevar: Assuming ordered data
#> Panel-lag computed without timevar: Assuming ordered data
#> #> Call: #> lm(formula = PCGDP ~ L(PCGDP, 1:2, iso3c) + L(LIFEEX, 0:2, iso3c), #> data = wlddev) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16621.0 -100.0 -17.2 86.2 11935.3 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -321.51378 63.37246 -5.073 4e-07 *** #> L(PCGDP, 1:2, iso3c)L1 1.31801 0.01061 124.173 <2e-16 *** #> L(PCGDP, 1:2, iso3c)L2 -0.31550 0.01070 -29.483 <2e-16 *** #> L(LIFEEX, 0:2, iso3c)-- -1.93638 38.24878 -0.051 0.960 #> L(LIFEEX, 0:2, iso3c)L1 10.01163 71.20359 0.141 0.888 #> L(LIFEEX, 0:2, iso3c)L2 -1.66669 37.70885 -0.044 0.965 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 791.3 on 7988 degrees of freedom #> (4750 observations deleted due to missingness) #> Multiple R-squared: 0.9974, Adjusted R-squared: 0.9974 #> F-statistic: 6.166e+05 on 5 and 7988 DF, p-value: < 2.2e-16 #>
g = qF(wlddev$iso3c); t = qF(wlddev$year) # 4- Precomputing summary(lm(PCGDP ~ L(PCGDP,1:2,g,t) + L(LIFEEX,0:2,g,t), wlddev)) # panel-id's
#> #> Call: #> lm(formula = PCGDP ~ L(PCGDP, 1:2, g, t) + L(LIFEEX, 0:2, g, #> t), data = wlddev) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16621.0 -100.0 -17.2 86.2 11935.3 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -321.51378 63.37246 -5.073 4e-07 *** #> L(PCGDP, 1:2, g, t)L1 1.31801 0.01061 124.173 <2e-16 *** #> L(PCGDP, 1:2, g, t)L2 -0.31550 0.01070 -29.483 <2e-16 *** #> L(LIFEEX, 0:2, g, t)-- -1.93638 38.24878 -0.051 0.960 #> L(LIFEEX, 0:2, g, t)L1 10.01163 71.20359 0.141 0.888 #> L(LIFEEX, 0:2, g, t)L2 -1.66669 37.70885 -0.044 0.965 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 791.3 on 7988 degrees of freedom #> (4750 observations deleted due to missingness) #> Multiple R-squared: 0.9974, Adjusted R-squared: 0.9974 #> F-statistic: 6.166e+05 on 5 and 7988 DF, p-value: < 2.2e-16 #>
## Using plm: pwlddev <- plm::pdata.frame(wlddev, index = c("iso3c","year")) head(L(pwlddev, 0:2, 9:10)) # Again 2 lags of GDP and LIFEEX
#> iso3c year PCGDP L1.PCGDP L2.PCGDP LIFEEX L1.LIFEEX L2.LIFEEX #> ABW-1960 ABW 1960 NA NA NA 65.662 NA NA #> ABW-1961 ABW 1961 NA NA NA 66.074 65.662 NA #> ABW-1962 ABW 1962 NA NA NA 66.444 66.074 65.662 #> ABW-1963 ABW 1963 NA NA NA 66.787 66.444 66.074 #> ABW-1964 ABW 1964 NA NA NA 67.113 66.787 66.444 #> ABW-1965 ABW 1965 NA NA NA 67.435 67.113 66.787
PCGDP <- pwlddev$PCGDP # A panel-Series of GDP per Capita head(L(PCGDP)) # Lagging the panel series
#> ABW-1960 ABW-1961 ABW-1962 ABW-1963 ABW-1964 ABW-1965 #> NA NA NA NA NA NA
summary(lm(PCGDP ~ ., L(pwlddev, 0:2, 9:10, keep.ids = FALSE))) # Running the lm again
#> #> Call: #> lm(formula = PCGDP ~ ., data = L(pwlddev, 0:2, 9:10, keep.ids = FALSE)) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16621.0 -100.0 -17.2 86.2 11935.3 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -321.51378 63.37246 -5.073 4e-07 *** #> L1.PCGDP 1.31801 0.01061 124.173 <2e-16 *** #> L2.PCGDP -0.31550 0.01070 -29.483 <2e-16 *** #> LIFEEX -1.93638 38.24878 -0.051 0.960 #> L1.LIFEEX 10.01163 71.20359 0.141 0.888 #> L2.LIFEEX -1.66669 37.70885 -0.044 0.965 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 791.3 on 7988 degrees of freedom #> (4750 observations deleted due to missingness) #> Multiple R-squared: 0.9974, Adjusted R-squared: 0.9974 #> F-statistic: 6.166e+05 on 5 and 7988 DF, p-value: < 2.2e-16 #>
# THIS DOES NOT WORK: -> a pseries is only created when subsetting the pdata.frame using $ or [[ summary(lm(PCGDP ~ L(PCGDP,1:2) + L(LIFEEX,0:2), pwlddev)) # ..so L.default is used here..
#> #> Call: #> lm(formula = PCGDP ~ L(PCGDP, 1:2) + L(LIFEEX, 0:2), data = pwlddev) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16621.0 -100.0 -17.2 86.2 11935.3 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -321.51378 63.37246 -5.073 4e-07 *** #> L(PCGDP, 1:2)L1 1.31801 0.01061 124.173 <2e-16 *** #> L(PCGDP, 1:2)L2 -0.31550 0.01070 -29.483 <2e-16 *** #> L(LIFEEX, 0:2)-- -1.93638 38.24878 -0.051 0.960 #> L(LIFEEX, 0:2)L1 10.01163 71.20359 0.141 0.888 #> L(LIFEEX, 0:2)L2 -1.66669 37.70885 -0.044 0.965 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 791.3 on 7988 degrees of freedom #> (4750 observations deleted due to missingness) #> Multiple R-squared: 0.9974, Adjusted R-squared: 0.9974 #> F-statistic: 6.166e+05 on 5 and 7988 DF, p-value: < 2.2e-16 #>
LIFEEX <- pwlddev$LIFEEX # To make it work, create pseries summary(lm(PCGDP ~ L(PCGDP,1:2) + L(LIFEEX,0:2))) # THIS WORKS !
#> #> Call: #> lm(formula = PCGDP ~ L(PCGDP, 1:2) + L(LIFEEX, 0:2)) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16621.0 -100.0 -17.2 86.2 11935.3 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -321.51378 63.37246 -5.073 4e-07 *** #> L(PCGDP, 1:2)L1 1.31801 0.01061 124.173 <2e-16 *** #> L(PCGDP, 1:2)L2 -0.31550 0.01070 -29.483 <2e-16 *** #> L(LIFEEX, 0:2)-- -1.93638 38.24878 -0.051 0.960 #> L(LIFEEX, 0:2)L1 10.01163 71.20359 0.141 0.888 #> L(LIFEEX, 0:2)L2 -1.66669 37.70885 -0.044 0.965 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 791.3 on 7988 degrees of freedom #> (4750 observations deleted due to missingness) #> Multiple R-squared: 0.9974, Adjusted R-squared: 0.9974 #> F-statistic: 6.166e+05 on 5 and 7988 DF, p-value: < 2.2e-16 #>
## Using dplyr: library(dplyr) wlddev %>% group_by(iso3c) %>% select(PCGDP,LIFEEX) %>% L(0:2)
#> Adding missing grouping variables: `iso3c`
#> Panel-lag computed without timevar: Assuming ordered data
#> # A tibble: 12,744 x 7 #> # Groups: iso3c [216] #> iso3c PCGDP L1.PCGDP L2.PCGDP LIFEEX L1.LIFEEX L2.LIFEEX #> * <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 AFG NA NA NA 32.3 NA NA #> 2 AFG NA NA NA 32.7 32.3 NA #> 3 AFG NA NA NA 33.2 32.7 32.3 #> 4 AFG NA NA NA 33.6 33.2 32.7 #> 5 AFG NA NA NA 34.1 33.6 33.2 #> 6 AFG NA NA NA 34.5 34.1 33.6 #> 7 AFG NA NA NA 34.9 34.5 34.1 #> 8 AFG NA NA NA 35.4 34.9 34.5 #> 9 AFG NA NA NA 35.8 35.4 34.9 #> 10 AFG NA NA NA 36.2 35.8 35.4 #> # ... with 12,734 more rows
wlddev %>% group_by(iso3c) %>% select(year,PCGDP,LIFEEX) %>% L(0:2,year) # Also using t (safer)
#> Adding missing grouping variables: `iso3c`
#> # A tibble: 12,744 x 8 #> # Groups: iso3c [216] #> iso3c year PCGDP L1.PCGDP L2.PCGDP LIFEEX L1.LIFEEX L2.LIFEEX #> * <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 AFG 1960 NA NA NA 32.3 NA NA #> 2 AFG 1961 NA NA NA 32.7 32.3 NA #> 3 AFG 1962 NA NA NA 33.2 32.7 32.3 #> 4 AFG 1963 NA NA NA 33.6 33.2 32.7 #> 5 AFG 1964 NA NA NA 34.1 33.6 33.2 #> 6 AFG 1965 NA NA NA 34.5 34.1 33.6 #> 7 AFG 1966 NA NA NA 34.9 34.5 34.1 #> 8 AFG 1967 NA NA NA 35.4 34.9 34.5 #> 9 AFG 1968 NA NA NA 35.8 35.4 34.9 #> 10 AFG 1969 NA NA NA 36.2 35.8 35.4 #> # ... with 12,734 more rows