fdiff is a S3 generic to compute (sequences of) suitably lagged / leaded and iterated differences, quasi-differences, log-differences or quasi-log-differences. The difference and log-difference operators D and Dlog also exists as parsimonious wrappers around fdiff, providing more flexibility than fdiff when applied to data frames.

fdiff(x, n = 1, diff = 1, ...)
      D(x, n = 1, diff = 1, ...)
   Dlog(x, n = 1, diff = 1, ...)

# S3 method for default
fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, log = FALSE, rho = 1,
      stubs = TRUE, ...)
# S3 method for default
D(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1,
  stubs = TRUE, ...)
# S3 method for default
Dlog(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1, stubs = TRUE, ...)

# S3 method for matrix
fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, log = FALSE, rho = 1,
      stubs = length(n) + length(diff) > 2L, ...)
# S3 method for matrix
D(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1,
  stubs = TRUE, ...)
# S3 method for matrix
Dlog(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1, stubs = TRUE, ...)

# S3 method for data.frame
fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, log = FALSE, rho = 1,
      stubs = length(n) + length(diff) > 2L, ...)
# S3 method for data.frame
D(x, n = 1, diff = 1, by = NULL, t = NULL, cols = is.numeric,
  fill = NA, rho = 1, stubs = TRUE, keep.ids = TRUE, ...)
# S3 method for data.frame
Dlog(x, n = 1, diff = 1, by = NULL, t = NULL, cols = is.numeric,
     fill = NA, rho = 1, stubs = TRUE, keep.ids = TRUE, ...)

# Methods for compatibility with plm:

# S3 method for pseries
fdiff(x, n = 1, diff = 1, fill = NA, log = FALSE, rho = 1, stubs = TRUE, ...)
# S3 method for pseries
D(x, n = 1, diff = 1, fill = NA, rho = 1, stubs = TRUE, ...)
# S3 method for pseries
Dlog(x, n = 1, diff = 1, fill = NA, rho = 1, stubs = TRUE, ...)

# S3 method for pdata.frame
fdiff(x, n = 1, diff = 1, fill = NA, log = FALSE, rho = 1,
      stubs = length(n) + length(diff) > 2L, ...)
# S3 method for pdata.frame
D(x, n = 1, diff = 1, cols = is.numeric, fill = NA, rho = 1, stubs = TRUE,
  keep.ids = TRUE, ...)
# S3 method for pdata.frame
Dlog(x, n = 1, diff = 1, cols = is.numeric, fill = NA, rho = 1, stubs = TRUE,
     keep.ids = TRUE, ...)

# Methods for grouped data frame / compatibility with dplyr:

# S3 method for grouped_df
fdiff(x, n = 1, diff = 1, t = NULL, fill = NA, log = FALSE, rho = 1,
      stubs = length(n) + length(diff) > 2L, keep.ids = TRUE, ...)
# S3 method for grouped_df
D(x, n = 1, diff = 1, t = NULL, fill = NA, rho = 1, stubs = TRUE,
  keep.ids = TRUE, ...)
# S3 method for grouped_df
Dlog(x, n = 1, diff = 1, t = NULL, fill = NA, rho = 1, stubs = TRUE,
     keep.ids = TRUE, ...)

Arguments

x

a numeric vector / time series, (time series) matrix, data frame, panel series (plm::pseries), panel data frame (plm::pdata.frame) or grouped data frame (class 'grouped_df').

n

integer. A vector indicating the number of lags or leads.

diff

integer. A vector of integers > 1 indicating the order of differencing / log-differencing.

g

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x.

by

data.frame method: Same as g, but also allows one- or two-sided formulas i.e. ~ group1 or var1 + var2 ~ group1 + group2. See Examples.

t

same input as g/by, to indicate the time-variable(s). For safe computation of differences on unordered time series and panels. Data Frame method also allows one-sided formula i.e. ~time. grouped_df method supports lazy-evaluation i.e. time (no quotes).

cols

data.frame method: Select columns to difference using a function, column names, indices or a logical vector. Default: All numeric variables. Note: cols is ignored if a two-sided formula is passed to by.

fill

value to insert when vectors are shifted. Default is NA.

log

logical. TRUE computes log-differences instead. See Details.

rho

double. Autocorrelation parameter. Set to a value between 0 and 1 for quasi-differencing. Any numeric value can be supplied.

stubs

logical. TRUE will rename all differenced columns by adding prefixes "LnDdiff." / "FnDdiff." for differences "LnDlogdiff." / "FnDlogdiff." for log-differences and replacing "D" / "Dlog" with "QD" / "QDlog" for quasi-differences.

keep.ids

data.frame / pdata.frame / grouped_df methods: Logical. Drop all panel-identifiers from the output (which includes all variables passed to by or t). Note: For grouped / panel data frames identifiers are dropped, but the 'groups' / 'index' attributes are kept.

...

arguments to be passed to or from other methods.

Details

By default, fdiff/D/Dlog return x with all columns differenced / log-differenced. Differences are computed as repeat(diff) x[i] - rho*x[i-n], and log-differences as repeat(diff) log(x[i]) - rho*log(x[i-n]). If rho < 1, this becomes quasi- (or partial) differencing, which is a technique suggested by Cochrane and Orcutt (1949) to deal with serial correlation in regression models, where rho is typically estimated by running a regression of the model residuals on the lagged residuals. Setting diff = 2 returns differences of differences etc... and setting n = 2 returns simple differences computed by subtracting twice-lagged x from x. It is also possible to compute forward differences by passing negative n values. n also supports arbitrary vectors of integers (lags), and diff supports positive sequences of integers (differences):

If more than one value is passed to n and/or diff, the data is expanded-wide as follows: If x is an atomic vector or time series, a (time series) matrix is returned with columns ordered first by lag, then by difference. If x is a matrix or data frame, each column is expanded in like manor such that the output has ncol(x)*length(n)*length(diff) columns ordered first by column name, then by lag, then by difference.

With groups/panel-identifiers supplied to g/by, fdiff/D/Dlog efficiently compute panel-differences. If t is left empty, the data needs to be ordered such that all values belonging to a group are consecutive and in the right order. It is not necessary that the groups themselves occur in the right order. If time-variable(s) are supplied to t, the panel is fully identified and differences can be securely computed even if the data is unordered.

fdiff/D/Dlog supports balanced panels and unbalanced panels where various individuals are observed for different time-sequences. For computational details and efficiency considerations see the help page for flag.

It is also possible to compute differences on unordered vectors or irregular time series (thus utilizing t but leaving g/by empty).

The methods applying to plm objects (panel series and panel data frames) automatically utilize the panel-identifiers attached to these objects and thus securely compute fully identified panel-differences. If these objects have > 2 panel-identifiers attached to them, the last identifier is assumed to be the time-variable, and the others are taken as grouping-variables and interacted.

Value

x differenced diff times using lags n of itself. Quasi and log-differences are toggled by the rho and log arguments or the Dlog operator. Computations can be grouped by g/by and/or ordered by t. See Details and Examples.

References

Cochrane, D.; Orcutt, G. H. (1949). Application of Least Squares Regression to Relationships Containing Auto-Correlated Error Terms. Journal of the American Statistical Association. 44 (245): 32-61.

Prais, S. J. & Winsten, C. B. (1954). Trend Estimators and Serial Correlation. Cowles Commission Discussion Paper No. 383. Chicago.

See also

Examples

## Simple Time Series: AirPassengers D(AirPassengers) # 1st difference, same as fdiff(AirPassengers)
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec #> 1949 NA 6 14 -3 -8 14 13 0 -12 -17 -15 14 #> 1950 -3 11 15 -6 -10 24 21 0 -12 -25 -19 26 #> 1951 5 5 28 -15 9 6 21 0 -15 -22 -16 20 #> 1952 5 9 13 -12 2 35 12 12 -33 -18 -19 22 #> 1953 2 0 40 -1 -6 14 21 8 -35 -26 -31 21 #> 1954 3 -16 47 -8 7 30 38 -9 -34 -30 -26 26 #> 1955 13 -9 34 2 1 45 49 -17 -35 -38 -37 41 #> 1956 6 -7 40 -4 5 56 39 -8 -50 -49 -35 35 #> 1957 9 -14 55 -8 7 67 43 2 -63 -57 -42 31 #> 1958 4 -22 44 -14 15 72 56 14 -101 -45 -49 27 #> 1959 23 -18 64 -10 24 52 76 11 -96 -56 -45 43 #> 1960 12 -26 28 42 11 63 87 -16 -98 -47 -71 42
D(AirPassengers, -1) # Forward difference
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec #> 1949 -6 -14 3 8 -14 -13 0 12 17 15 -14 3 #> 1950 -11 -15 6 10 -24 -21 0 12 25 19 -26 -5 #> 1951 -5 -28 15 -9 -6 -21 0 15 22 16 -20 -5 #> 1952 -9 -13 12 -2 -35 -12 -12 33 18 19 -22 -2 #> 1953 0 -40 1 6 -14 -21 -8 35 26 31 -21 -3 #> 1954 16 -47 8 -7 -30 -38 9 34 30 26 -26 -13 #> 1955 9 -34 -2 -1 -45 -49 17 35 38 37 -41 -6 #> 1956 7 -40 4 -5 -56 -39 8 50 49 35 -35 -9 #> 1957 14 -55 8 -7 -67 -43 -2 63 57 42 -31 -4 #> 1958 22 -44 14 -15 -72 -56 -14 101 45 49 -27 -23 #> 1959 18 -64 10 -24 -52 -76 -11 96 56 45 -43 -12 #> 1960 26 -28 -42 -11 -63 -87 16 98 47 71 -42 NA
Dlog(AirPassengers) # Log-difference
#> Jan Feb Mar Apr May #> 1949 NA 0.052185753 0.112117298 -0.022989518 -0.064021859 #> 1950 -0.025752496 0.091349779 0.112477983 -0.043485112 -0.076961041 #> 1951 0.035091320 0.033901552 0.171148256 -0.088033349 0.053744276 #> 1952 0.029675768 0.051293294 0.069733338 -0.064193158 0.010989122 #> 1953 0.010256500 0.000000000 0.185717146 -0.004246291 -0.025863511 #> 1954 0.014815086 -0.081678031 0.223143551 -0.034635497 0.030371098 #> 1955 0.055215723 -0.037899273 0.136210205 0.007462721 0.003710579 #> 1956 0.021353124 -0.024956732 0.134884268 -0.012698583 0.015848192 #> 1957 0.028987537 -0.045462374 0.167820466 -0.022728251 0.019915310 #> 1958 0.011834458 -0.066894235 0.129592829 -0.039441732 0.042200354 #> 1959 0.066021101 -0.051293294 0.171542423 -0.024938948 0.058840500 #> 1960 0.029199155 -0.064378662 0.069163360 0.095527123 0.023580943 #> Jun Jul Aug Sep Oct #> 1949 0.109484233 0.091937495 0.000000000 -0.084557388 -0.133531393 #> 1950 0.175632569 0.131852131 0.000000000 -0.073203404 -0.172245905 #> 1951 0.034289073 0.111521274 0.000000000 -0.078369067 -0.127339422 #> 1952 0.175008910 0.053584246 0.050858417 -0.146603474 -0.090060824 #> 1953 0.059339440 0.082887660 0.029852963 -0.137741925 -0.116202008 #> 1954 0.120627988 0.134477914 -0.030254408 -0.123344547 -0.123106058 #> 1955 0.154150680 0.144581229 -0.047829088 -0.106321592 -0.129875081 #> 1956 0.162204415 0.099191796 -0.019560526 -0.131769278 -0.148532688 #> 1957 0.172887525 0.097032092 0.004291852 -0.144914380 -0.152090098 #> 1958 0.180943197 0.121098097 0.028114301 -0.223143551 -0.118092489 #> 1959 0.116724274 0.149296301 0.019874186 -0.188422419 -0.128913869 #> 1960 0.125287761 0.150673346 -0.026060107 -0.176398538 -0.097083405 #> Nov Dec #> 1949 -0.134732594 0.126293725 #> 1950 -0.154150680 0.205443974 #> 1951 -0.103989714 0.128381167 #> 1952 -0.104778951 0.120363682 #> 1953 -0.158901283 0.110348057 #> 1954 -0.120516025 0.120516025 #> 1955 -0.145067965 0.159560973 #> 1956 -0.121466281 0.121466281 #> 1957 -0.129013003 0.096799383 #> 1958 -0.146750091 0.083510633 #> 1959 -0.117168974 0.112242855 #> 1960 -0.167251304 0.102278849
D(AirPassengers, 1, 2) # Second difference
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec #> 1949 NA NA 8 -17 -5 22 -1 -13 -12 -5 2 29 #> 1950 -17 14 4 -21 -4 34 -3 -21 -12 -13 6 45 #> 1951 -21 0 23 -43 24 -3 15 -21 -15 -7 6 36 #> 1952 -15 4 4 -25 14 33 -23 0 -45 15 -1 41 #> 1953 -20 -2 40 -41 -5 20 7 -13 -43 9 -5 52 #> 1954 -18 -19 63 -55 15 23 8 -47 -25 4 4 52 #> 1955 -13 -22 43 -32 -1 44 4 -66 -18 -3 1 78 #> 1956 -35 -13 47 -44 9 51 -17 -47 -42 1 14 70 #> 1957 -26 -23 69 -63 15 60 -24 -41 -65 6 15 73 #> 1958 -27 -26 66 -58 29 57 -16 -42 -115 56 -4 76 #> 1959 -4 -41 82 -74 34 28 24 -65 -107 40 11 88 #> 1960 -31 -38 54 14 -31 52 24 -103 -82 51 -24 113
Dlog(AirPassengers, 1, 2) # Second log-difference
#> Jan Feb Mar Apr May #> 1949 NA NA 0.0599315450 -0.1351068163 -0.0410323405 #> 1950 -0.1520462214 0.1171022747 0.0211282048 -0.1559630954 -0.0334759292 #> 1951 -0.1703526544 -0.0011897681 0.1372467045 -0.2591816057 0.1417776255 #> 1952 -0.0987053985 0.0216175262 0.0184400436 -0.1339264957 0.0751822792 #> 1953 -0.1101071821 -0.0102565002 0.1857171458 -0.1899634367 -0.0216172197 #> 1954 -0.0955329714 -0.0964931168 0.3048215823 -0.2577790480 0.0650065945 #> 1955 -0.0653003019 -0.0931149952 0.1741094774 -0.1287474836 -0.0037521418 #> 1956 -0.1382078481 -0.0463098564 0.1598409997 -0.1475828510 0.0285467756 #> 1957 -0.0924787442 -0.0744499110 0.2132828402 -0.1905487172 0.0426435608 #> 1958 -0.0849649257 -0.0787286925 0.1964870639 -0.1690345611 0.0816420865 #> 1959 -0.0174895318 -0.1173143955 0.2228357169 -0.1964813709 0.0837794484 #> 1960 -0.0830437006 -0.0935778165 0.1335420218 0.0263637631 -0.0719461805 #> Jun Jul Aug Sep Oct #> 1949 0.1735060916 -0.0175467375 -0.0919374953 -0.0845573880 -0.0489740046 #> 1950 0.2525936098 -0.0437804375 -0.1318521311 -0.0732034040 -0.0990425008 #> 1951 -0.0194552025 0.0772322010 -0.1115212744 -0.0783690671 -0.0489703553 #> 1952 0.1640197884 -0.1214246638 -0.0027258289 -0.1974618914 0.0565426503 #> 1953 0.0852029504 0.0235482200 -0.0530346967 -0.1675948883 0.0215399175 #> 1954 0.0902568899 0.0138499264 -0.1647323226 -0.0930901390 0.0002384892 #> 1955 0.1504401004 -0.0095694510 -0.1924103165 -0.0584925044 -0.0235534893 #> 1956 0.1463562224 -0.0630126191 -0.1187523214 -0.1122087518 -0.0167634099 #> 1957 0.1529722149 -0.0758554330 -0.0927402395 -0.1492062318 -0.0071757183 #> 1958 0.1387428423 -0.0598451001 -0.0929837952 -0.2512578528 0.1050510618 #> 1959 0.0578837743 0.0325720271 -0.1294221152 -0.2082966053 0.0595085504 #> 1960 0.1017068187 0.0253855845 -0.1767334525 -0.1503384318 0.0793151339 #> Nov Dec #> 1949 -0.0012012013 0.2610263193 #> 1950 0.0180952250 0.3595946540 #> 1951 0.0233497089 0.2323708802 #> 1952 -0.0147181273 0.2251426335 #> 1953 -0.0426992749 0.2692493398 #> 1954 0.0025900336 0.2410320490 #> 1955 -0.0151928838 0.3046289378 #> 1956 0.0270664065 0.2429325621 #> 1957 0.0230770947 0.2258123867 #> 1958 -0.0286576015 0.2302607239 #> 1959 0.0117448950 0.2294118289 #> 1960 -0.0701678993 0.2695301530
D(AirPassengers, 12) # Seasonal difference (data is monthly)
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec #> 1949 NA NA NA NA NA NA NA NA NA NA NA NA #> 1950 3 8 9 6 4 14 22 22 22 14 10 22 #> 1951 30 24 37 28 47 29 29 29 26 29 32 26 #> 1952 26 30 15 18 11 40 31 43 25 29 26 28 #> 1953 25 16 43 54 46 25 34 30 28 20 8 7 #> 1954 8 -8 -1 -8 5 21 38 21 22 18 23 28 #> 1955 38 45 32 42 36 51 62 54 53 45 34 49 #> 1956 42 44 50 44 48 59 49 58 43 32 34 28 #> 1957 31 24 39 35 37 48 52 62 49 41 34 30 #> 1958 25 17 6 0 8 13 26 38 0 12 5 1 #> 1959 20 24 44 48 57 37 57 54 59 48 52 68 #> 1960 57 49 13 65 52 63 74 47 45 54 28 27
D(AirPassengers, # Quasi-difference, see a better example below rho = pwcor(AirPassengers, L(AirPassengers)))
#> Jan Feb Mar Apr May Jun #> 1949 NA 10.4581994 18.6970315 2.2543065 -2.8651096 18.8164476 #> 1950 1.6970315 15.5776155 20.0154743 -0.3874454 -4.6262775 28.9756690 #> 1951 10.5727493 10.7717760 33.9708028 -7.9146474 15.4882724 12.8465205 #> 1952 11.6076884 15.8067152 20.1649634 -4.3175671 9.2047687 42.2843794 #> 1953 9.7222383 7.8018490 47.8018490 8.3940631 3.3542577 23.1154256 #> 1954 11.0008757 -7.8797082 54.4834062 1.3542577 16.0358149 39.3144524 #> 1955 22.1154256 0.6328952 43.2746470 12.6280290 11.7076397 55.7474450 #> 1956 17.0658878 4.3047200 51.0260825 8.6182966 17.4590752 68.6581019 #> 1957 21.1804377 -1.4613141 66.9814109 6.1707053 20.8522625 81.1308999 #> 1958 17.3745983 -8.4661803 56.6581019 0.4095374 28.8522625 86.4493428 #> 1959 36.4144036 -3.6700733 77.6134304 6.1609729 39.7629194 68.7182478 #> 1960 28.1211675 -9.4011682 43.5638926 58.6784425 29.3502672 81.7881261 #> Jul Aug Sep Oct Nov Dec #> 1949 18.3737225 5.8911921 -6.1088079 -11.5864721 -10.2631631 18.1397566 #> 1950 26.9309974 6.7669098 -5.2330902 -18.7107544 -13.7058882 30.5378101 #> 1951 28.0853526 7.9212650 -7.0787350 -14.6758152 -9.5515330 25.8115814 #> 1952 20.6775667 21.1552309 -23.3671048 -9.6806814 -11.3971778 28.8465205 #> 1953 30.6727005 18.5086129 -24.1729443 -16.5661316 -22.6010707 28.1649634 #> 1954 48.5086129 3.0212163 -22.3370319 -19.6904138 -16.8845744 34.0804864 #> 1955 61.5386859 -2.5108519 -21.1875429 -25.5807302 -26.0933336 50.4338684 #> 1956 53.8872016 8.4396104 -33.8788325 -34.8691001 -22.8195623 45.7872504 #> 1957 59.7978585 20.5094887 -44.4109006 -40.9186378 -28.1875429 43.1406323 #> 1958 73.3153281 33.5444278 -80.8982973 -28.9186378 -34.7098786 39.3396591 #> 1959 94.7881261 32.8133329 -73.7488083 -37.5701220 -28.7992218 57.4095374 #> 1960 108.2958633 8.7589289 -73.8779567 -26.7788812 -52.6497328 57.5240873
head(D(AirPassengers, -2:2, 1:3)) # Sequence of leaded/lagged and iterated differences
#> F2D1 F2D2 F2D3 FD1 FD2 FD3 -- D1 D2 D3 L2D1 L2D2 L2D3 #> [1,] -20 -31 -69 -6 8 25 112 NA NA NA NA NA NA #> [2,] -11 -5 -12 -14 -17 -12 118 6 NA NA NA NA NA #> [3,] 11 38 77 3 -5 -27 132 14 8 NA 20 NA NA #> [4,] -6 7 49 8 22 23 129 -3 -17 -25 11 NA NA #> [5,] -27 -39 -19 -14 -1 12 121 -8 -5 12 -11 -31 NA #> [6,] -13 -42 -70 -13 -13 -1 135 14 22 27 6 -5 NA
# let's do some visual analysis plot(AirPassengers) # Plot the series - seasonal pattern is evident
plot(stl(AirPassengers, "periodic")) # Seasonal decomposition
plot(D(AirPassengers,c(1,12),1:2)) # Plotting ordinary and seasonal first and second differences
plot(stl(window(D(AirPassengers,12), # Taking seasonal differences removes most seasonal variation 1950), "periodic"))
## Time Series Matrix of 4 EU Stock Market Indicators, recorded 260 days per year plot(D(EuStockMarkets, c(0, 260))) # Plot series and annual differnces
mod <- lm(DAX ~., L(EuStockMarkets, c(0, 260))) # Regressing the DAX on its annual lag summary(mod) # and the levels and annual lags others
#> #> Call: #> lm(formula = DAX ~ ., data = L(EuStockMarkets, c(0, 260))) #> #> Residuals: #> Min 1Q Median 3Q Max #> -224.33 -57.02 -12.40 51.51 359.96 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -123.26123 59.74149 -2.063 0.0393 * #> L260.DAX -0.02126 0.02151 -0.988 0.3232 #> SMI 0.37415 0.01356 27.589 <2e-16 *** #> L260.SMI 0.28186 0.01901 14.826 <2e-16 *** #> CAC 0.52973 0.01544 34.305 <2e-16 *** #> L260.CAC -0.23401 0.02145 -10.911 <2e-16 *** #> FTSE -0.03944 0.01780 -2.215 0.0269 * #> L260.FTSE 0.02888 0.02182 1.324 0.1858 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 84.02 on 1592 degrees of freedom #> (260 observations deleted due to missingness) #> Multiple R-squared: 0.9943, Adjusted R-squared: 0.9942 #> F-statistic: 3.94e+04 on 7 and 1592 DF, p-value: < 2.2e-16 #>
r <- residuals(mod) # Obtain residuals pwcor(r, L(r)) # Residual Autocorrelation
#> [1] .97
fFtest(r, L(r)) # F-test of residual autocorrelation
#> R-Sq. DF1 DF2 F-Stat. P-value #> 0.937 1 1597 23690.699 0.000
# (better use lmtest::bgtest) modCO <- lm(QD1.DAX ~., D(L(EuStockMarkets, c(0, 260)), # Cochrane-Orcutt (1949) estimation rho = pwcor(r, L(r)))) summary(modCO)
#> #> Call: #> lm(formula = QD1.DAX ~ ., data = D(L(EuStockMarkets, c(0, 260)), #> rho = pwcor(r, L(r)))) #> #> Residuals: #> Min 1Q Median 3Q Max #> -87.131 -9.079 -0.439 9.228 119.993 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -17.979391 2.094867 -8.583 <2e-16 *** #> QD1.L260.DAX 0.048116 0.034403 1.399 0.162 #> QD1.SMI 0.343808 0.013902 24.731 <2e-16 *** #> QD1.L260.SMI 0.014331 0.022530 0.636 0.525 #> QD1.CAC 0.459655 0.024406 18.834 <2e-16 *** #> QD1.L260.CAC -0.031068 0.030598 -1.015 0.310 #> QD1.FTSE 0.220516 0.020682 10.662 <2e-16 *** #> QD1.L260.FTSE 0.007577 0.025948 0.292 0.770 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 19.06 on 1591 degrees of freedom #> (261 observations deleted due to missingness) #> Multiple R-squared: 0.8582, Adjusted R-squared: 0.8576 #> F-statistic: 1376 on 7 and 1591 DF, p-value: < 2.2e-16 #>
rCO <- residuals(modCO) fFtest(rCO, L(rCO)) # No more autocorrelation
#> R-Sq. DF1 DF2 F-Stat. P-value #> 0.001 1 1596 2.326 0.127
## World Development Panel Data head(fdiff(num_vars(wlddev), 1, 1, # Computes differences of numeric variables wlddev$country, wlddev$year)) # fdiff requires external inputs..
#> year decade PCGDP LIFEEX GINI ODA #> 1 NA NA NA NA NA NA #> 2 1 0 NA 0.450 NA 118910000 #> 3 1 0 NA 0.443 NA -118470000 #> 4 1 0 NA 0.439 NA 121570000 #> 5 1 0 NA 0.436 NA 66030000 #> 6 1 0 NA 0.435 NA 67770000
head(D(wlddev, 1, 1, ~country, ~year)) # Differences of numeric variables
#> country year D1.decade D1.PCGDP D1.LIFEEX D1.GINI D1.ODA #> 1 Afghanistan 1960 NA NA NA NA NA #> 2 Afghanistan 1961 0 NA 0.450 NA 118910000 #> 3 Afghanistan 1962 0 NA 0.443 NA -118470000 #> 4 Afghanistan 1963 0 NA 0.439 NA 121570000 #> 5 Afghanistan 1964 0 NA 0.436 NA 66030000 #> 6 Afghanistan 1965 0 NA 0.435 NA 67770000
head(D(wlddev, 1, 1, ~country)) # Without t: Works because data is ordered
#> Panel-difference computed without timevar: Assuming ordered data
#> country D1.year D1.decade D1.PCGDP D1.LIFEEX D1.GINI D1.ODA #> 1 Afghanistan NA NA NA NA NA NA #> 2 Afghanistan 1 0 NA 0.450 NA 118910000 #> 3 Afghanistan 1 0 NA 0.443 NA -118470000 #> 4 Afghanistan 1 0 NA 0.439 NA 121570000 #> 5 Afghanistan 1 0 NA 0.436 NA 66030000 #> 6 Afghanistan 1 0 NA 0.435 NA 67770000
head(D(wlddev, 1, 1, PCGDP + LIFEEX ~ country, ~year)) # Difference of GDP & Life Expectancy
#> country year D1.PCGDP D1.LIFEEX #> 1 Afghanistan 1960 NA NA #> 2 Afghanistan 1961 NA 0.450 #> 3 Afghanistan 1962 NA 0.443 #> 4 Afghanistan 1963 NA 0.439 #> 5 Afghanistan 1964 NA 0.436 #> 6 Afghanistan 1965 NA 0.435
head(D(wlddev, 0:1, 1, ~ country, ~year, cols = 9:10)) # Same, also retaining original series
#> country year PCGDP D1.PCGDP LIFEEX D1.LIFEEX #> 1 Afghanistan 1960 NA NA 32.292 NA #> 2 Afghanistan 1961 NA NA 32.742 0.450 #> 3 Afghanistan 1962 NA NA 33.185 0.443 #> 4 Afghanistan 1963 NA NA 33.624 0.439 #> 5 Afghanistan 1964 NA NA 34.060 0.436 #> 6 Afghanistan 1965 NA NA 34.495 0.435
head(D(wlddev, 0:1, 1, ~ country, ~year, 9:10, # Dropping id columns keep.ids = FALSE))
#> PCGDP D1.PCGDP LIFEEX D1.LIFEEX #> 1 NA NA 32.292 NA #> 2 NA NA 32.742 0.450 #> 3 NA NA 33.185 0.443 #> 4 NA NA 33.624 0.439 #> 5 NA NA 34.060 0.436 #> 6 NA NA 34.495 0.435
# Dynamic Panel Data Models: summary(lm(D(PCGDP,1,1,iso3c,year) ~ # Diff. GDP regressed on it's lagged level L(PCGDP,1,iso3c,year) + # and the difference of Life Expanctancy D(LIFEEX,1,1,iso3c,year), data = wlddev))
#> #> Call: #> lm(formula = D(PCGDP, 1, 1, iso3c, year) ~ L(PCGDP, 1, iso3c, #> year) + D(LIFEEX, 1, 1, iso3c, year), data = wlddev) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16877.4 -103.2 -52.4 106.0 12606.3 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 8.918e+01 1.410e+01 6.325 2.67e-10 *** #> L(PCGDP, 1, iso3c, year) 8.943e-03 6.035e-04 14.820 < 2e-16 *** #> D(LIFEEX, 1, 1, iso3c, year) -2.559e+01 2.528e+01 -1.012 0.311 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 830.5 on 8181 degrees of freedom #> (4560 observations deleted due to missingness) #> Multiple R-squared: 0.02721, Adjusted R-squared: 0.02698 #> F-statistic: 114.4 on 2 and 8181 DF, p-value: < 2.2e-16 #>
g = qF(wlddev$country) # Omitting t and precomputing g allows for summary(lm(D(PCGDP,1,1,g) ~ L(PCGDP,1,g) + # a bit more parsimonious specification D(LIFEEX,1,1,g), wlddev))
#> Panel-difference computed without timevar: Assuming ordered data
#> Panel-lag computed without timevar: Assuming ordered data
#> Panel-difference computed without timevar: Assuming ordered data
#> #> Call: #> lm(formula = D(PCGDP, 1, 1, g) ~ L(PCGDP, 1, g) + D(LIFEEX, 1, #> 1, g), data = wlddev) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16877.4 -103.2 -52.4 106.0 12606.3 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 8.918e+01 1.410e+01 6.325 2.67e-10 *** #> L(PCGDP, 1, g) 8.943e-03 6.035e-04 14.820 < 2e-16 *** #> D(LIFEEX, 1, 1, g) -2.559e+01 2.528e+01 -1.012 0.311 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 830.5 on 8181 degrees of freedom #> (4560 observations deleted due to missingness) #> Multiple R-squared: 0.02721, Adjusted R-squared: 0.02698 #> F-statistic: 114.4 on 2 and 8181 DF, p-value: < 2.2e-16 #>
summary(lm(D1.PCGDP ~., # Now adding level and lagged level of L(D(wlddev,0:1,1, ~ country, ~year,9:10),0:1, # LIFEEX and lagged differences rates ~ country, ~year, keep.ids = FALSE)[-1]))
#> #> Call: #> lm(formula = D1.PCGDP ~ ., data = L(D(wlddev, 0:1, 1, ~country, #> ~year, 9:10), 0:1, ~country, ~year, keep.ids = FALSE)[-1]) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16621.0 -100.0 -17.2 86.2 11935.3 #> #> Coefficients: (1 not defined because of singularities) #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -3.215e+02 6.337e+01 -5.073 4e-07 *** #> L1.PCGDP 2.507e-03 7.106e-04 3.529 0.00042 *** #> L1.D1.PCGDP 3.155e-01 1.070e-02 29.483 < 2e-16 *** #> LIFEEX -1.936e+00 3.825e+01 -0.051 0.95962 #> L1.LIFEEX 8.345e+00 3.814e+01 0.219 0.82683 #> D1.LIFEEX NA NA NA NA #> L1.D1.LIFEEX 1.667e+00 3.771e+01 0.044 0.96475 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 791.3 on 7988 degrees of freedom #> (4750 observations deleted due to missingness) #> Multiple R-squared: 0.1281, Adjusted R-squared: 0.1276 #> F-statistic: 234.8 on 5 and 7988 DF, p-value: < 2.2e-16 #>
## Using plm can make things easier, but avoid attaching or 'with' calls: pwlddev <- plm::pdata.frame(wlddev, index = c("country","year")) head(D(pwlddev, 0:1, 1, 9:10)) # Again differences of LIFEEX and PCGDP
#> country year PCGDP D1.PCGDP LIFEEX D1.LIFEEX #> Afghanistan-1960 Afghanistan 1960 NA NA 32.292 NA #> Afghanistan-1961 Afghanistan 1961 NA NA 32.742 0.450 #> Afghanistan-1962 Afghanistan 1962 NA NA 33.185 0.443 #> Afghanistan-1963 Afghanistan 1963 NA NA 33.624 0.439 #> Afghanistan-1964 Afghanistan 1964 NA NA 34.060 0.436 #> Afghanistan-1965 Afghanistan 1965 NA NA 34.495 0.435
PCGDP <- pwlddev$PCGDP # A panel-Series of GDP per Capita head(D(PCGDP)) # Differencing the panel series
#> Afghanistan-1960 Afghanistan-1961 Afghanistan-1962 Afghanistan-1963 #> NA NA NA NA #> Afghanistan-1964 Afghanistan-1965 #> NA NA
summary(lm(D1.PCGDP ~., # Running the dynamic model again -> data = L(D(pwlddev,0:1,1,9:10),0:1, # code becomes a bit simpler keep.ids = FALSE)[-1]))
#> #> Call: #> lm(formula = D1.PCGDP ~ ., data = L(D(pwlddev, 0:1, 1, 9:10), #> 0:1, keep.ids = FALSE)[-1]) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16621.0 -100.0 -17.2 86.2 11935.3 #> #> Coefficients: (1 not defined because of singularities) #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -3.215e+02 6.337e+01 -5.073 4e-07 *** #> L1.PCGDP 2.507e-03 7.106e-04 3.529 0.00042 *** #> L1.D1.PCGDP 3.155e-01 1.070e-02 29.483 < 2e-16 *** #> LIFEEX -1.936e+00 3.825e+01 -0.051 0.95962 #> L1.LIFEEX 8.345e+00 3.814e+01 0.219 0.82683 #> D1.LIFEEX NA NA NA NA #> L1.D1.LIFEEX 1.667e+00 3.771e+01 0.044 0.96475 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 791.3 on 7988 degrees of freedom #> (4750 observations deleted due to missingness) #> Multiple R-squared: 0.1281, Adjusted R-squared: 0.1276 #> F-statistic: 234.8 on 5 and 7988 DF, p-value: < 2.2e-16 #>
# One could be tempted to also do something like this, but THIS DOES NOT WORK!!: # -> a pseries is only created when subsetting the pdata.frame using $ or [[ summary(lm(D(PCGDP) ~ L(D(PCGDP,0:1)) + L(D(LIFEEX,0:1),0:1), pwlddev))
#> #> Call: #> lm(formula = D(PCGDP) ~ L(D(PCGDP, 0:1)) + L(D(LIFEEX, 0:1), #> 0:1), data = pwlddev) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16621.0 -100.0 -17.2 86.2 11935.3 #> #> Coefficients: (1 not defined because of singularities) #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -3.215e+02 6.337e+01 -5.073 4e-07 *** #> L(D(PCGDP, 0:1))L1.-- 2.507e-03 7.106e-04 3.529 0.00042 *** #> L(D(PCGDP, 0:1))L1.D1 3.155e-01 1.070e-02 29.483 < 2e-16 *** #> L(D(LIFEEX, 0:1), 0:1)-- -1.936e+00 3.825e+01 -0.051 0.95962 #> L(D(LIFEEX, 0:1), 0:1)L1.-- 8.345e+00 3.814e+01 0.219 0.82683 #> L(D(LIFEEX, 0:1), 0:1)D1 NA NA NA NA #> L(D(LIFEEX, 0:1), 0:1)L1.D1 1.667e+00 3.771e+01 0.044 0.96475 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 791.3 on 7988 degrees of freedom #> (4750 observations deleted due to missingness) #> Multiple R-squared: 0.1281, Adjusted R-squared: 0.1276 #> F-statistic: 234.8 on 5 and 7988 DF, p-value: < 2.2e-16 #>
# To make it work, one needs to create pseries LIFEEX <- pwlddev$LIFEEX summary(lm(D(PCGDP) ~ L(D(PCGDP,0:1)) + L(D(LIFEEX,0:1),0:1))) # THIS WORKS !
#> #> Call: #> lm(formula = D(PCGDP) ~ L(D(PCGDP, 0:1)) + L(D(LIFEEX, 0:1), #> 0:1)) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16621.0 -100.0 -17.2 86.2 11935.3 #> #> Coefficients: (1 not defined because of singularities) #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -3.215e+02 6.337e+01 -5.073 4e-07 *** #> L(D(PCGDP, 0:1))L1.-- 2.507e-03 7.106e-04 3.529 0.00042 *** #> L(D(PCGDP, 0:1))L1.D1 3.155e-01 1.070e-02 29.483 < 2e-16 *** #> L(D(LIFEEX, 0:1), 0:1)-- -1.936e+00 3.825e+01 -0.051 0.95962 #> L(D(LIFEEX, 0:1), 0:1)L1.-- 8.345e+00 3.814e+01 0.219 0.82683 #> L(D(LIFEEX, 0:1), 0:1)D1 NA NA NA NA #> L(D(LIFEEX, 0:1), 0:1)L1.D1 1.667e+00 3.771e+01 0.044 0.96475 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 791.3 on 7988 degrees of freedom #> (4750 observations deleted due to missingness) #> Multiple R-squared: 0.1281, Adjusted R-squared: 0.1276 #> F-statistic: 234.8 on 5 and 7988 DF, p-value: < 2.2e-16 #>
## Using dplyr: library(dplyr) wlddev %>% group_by(country) %>% select(PCGDP,LIFEEX) %>% fdiff(0:1,1:2) # Adding a first and second difference
#> Adding missing grouping variables: `country`
#> Panel-difference computed without timevar: Assuming ordered data
#> # A tibble: 12,744 x 7 #> # Groups: country [216] #> country PCGDP D1.PCGDP D2.PCGDP LIFEEX D1.LIFEEX D2.LIFEEX #> * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Afghanistan NA NA NA 32.3 NA NA #> 2 Afghanistan NA NA NA 32.7 0.450 NA #> 3 Afghanistan NA NA NA 33.2 0.443 -0.00700 #> 4 Afghanistan NA NA NA 33.6 0.439 -0.004 #> 5 Afghanistan NA NA NA 34.1 0.436 -0.003 #> 6 Afghanistan NA NA NA 34.5 0.435 -0.001 #> 7 Afghanistan NA NA NA 34.9 0.433 -0.00200 #> 8 Afghanistan NA NA NA 35.4 0.433 0 #> 9 Afghanistan NA NA NA 35.8 0.435 0.002 #> 10 Afghanistan NA NA NA 36.2 0.438 0.003 #> # ... with 12,734 more rows
wlddev %>% group_by(country) %>% select(year,PCGDP,LIFEEX) %>% D(0:1,1:2,year) # Also using t (safer)
#> Adding missing grouping variables: `country`
#> # A tibble: 12,744 x 8 #> # Groups: country [216] #> country year PCGDP D1.PCGDP D2.PCGDP LIFEEX D1.LIFEEX D2.LIFEEX #> * <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Afghanistan 1960 NA NA NA 32.3 NA NA #> 2 Afghanistan 1961 NA NA NA 32.7 0.450 NA #> 3 Afghanistan 1962 NA NA NA 33.2 0.443 -0.00700 #> 4 Afghanistan 1963 NA NA NA 33.6 0.439 -0.004 #> 5 Afghanistan 1964 NA NA NA 34.1 0.436 -0.003 #> 6 Afghanistan 1965 NA NA NA 34.5 0.435 -0.001 #> 7 Afghanistan 1966 NA NA NA 34.9 0.433 -0.00200 #> 8 Afghanistan 1967 NA NA NA 35.4 0.433 0 #> 9 Afghanistan 1968 NA NA NA 35.8 0.435 0.002 #> 10 Afghanistan 1969 NA NA NA 36.2 0.438 0.003 #> # ... with 12,734 more rows
wlddev %>% group_by(country) %>% # Dropping id's select(year,PCGDP,LIFEEX) %>% D(0:1,1:2,year, keep.ids = FALSE)
#> Adding missing grouping variables: `country`
#> # A tibble: 12,744 x 6 #> # Groups: country [216] #> PCGDP D1.PCGDP D2.PCGDP LIFEEX D1.LIFEEX D2.LIFEEX #> * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 NA NA NA 32.3 NA NA #> 2 NA NA NA 32.7 0.450 NA #> 3 NA NA NA 33.2 0.443 -0.00700 #> 4 NA NA NA 33.6 0.439 -0.004 #> 5 NA NA NA 34.1 0.436 -0.003 #> 6 NA NA NA 34.5 0.435 -0.001 #> 7 NA NA NA 34.9 0.433 -0.00200 #> 8 NA NA NA 35.4 0.433 0 #> 9 NA NA NA 35.8 0.435 0.002 #> 10 NA NA NA 36.2 0.438 0.003 #> # ... with 12,734 more rows