fdiff is a S3 generic to compute (sequences of) suitably lagged / leaded and iterated differences, quasi-differences, log-differences or quasi-log-differences. The difference and log-difference operators D and Dlog also exists as parsimonious wrappers around fdiff, providing more flexibility than fdiff when applied to data frames.

fdiff(x, n = 1, diff = 1, ...)
D(x, n = 1, diff = 1, ...)
Dlog(x, n = 1, diff = 1, ...)

# S3 method for default
fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, log = FALSE, rho = 1,
stubs = TRUE, ...)
# S3 method for default
D(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1,
stubs = TRUE, ...)
# S3 method for default
Dlog(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1, stubs = TRUE, ...)

# S3 method for matrix
fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, log = FALSE, rho = 1,
stubs = length(n) + length(diff) > 2L, ...)
# S3 method for matrix
D(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1,
stubs = TRUE, ...)
# S3 method for matrix
Dlog(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1, stubs = TRUE, ...)

# S3 method for data.frame
fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, log = FALSE, rho = 1,
stubs = length(n) + length(diff) > 2L, ...)
# S3 method for data.frame
D(x, n = 1, diff = 1, by = NULL, t = NULL, cols = is.numeric,
fill = NA, rho = 1, stubs = TRUE, keep.ids = TRUE, ...)
# S3 method for data.frame
Dlog(x, n = 1, diff = 1, by = NULL, t = NULL, cols = is.numeric,
fill = NA, rho = 1, stubs = TRUE, keep.ids = TRUE, ...)

# Methods for compatibility with plm:

# S3 method for pseries
fdiff(x, n = 1, diff = 1, fill = NA, log = FALSE, rho = 1, stubs = TRUE, ...)
# S3 method for pseries
D(x, n = 1, diff = 1, fill = NA, rho = 1, stubs = TRUE, ...)
# S3 method for pseries
Dlog(x, n = 1, diff = 1, fill = NA, rho = 1, stubs = TRUE, ...)

# S3 method for pdata.frame
fdiff(x, n = 1, diff = 1, fill = NA, log = FALSE, rho = 1,
stubs = length(n) + length(diff) > 2L, ...)
# S3 method for pdata.frame
D(x, n = 1, diff = 1, cols = is.numeric, fill = NA, rho = 1, stubs = TRUE,
keep.ids = TRUE, ...)
# S3 method for pdata.frame
Dlog(x, n = 1, diff = 1, cols = is.numeric, fill = NA, rho = 1, stubs = TRUE,
keep.ids = TRUE, ...)

# Methods for grouped data frame / compatibility with dplyr:

# S3 method for grouped_df
fdiff(x, n = 1, diff = 1, t = NULL, fill = NA, log = FALSE, rho = 1,
stubs = length(n) + length(diff) > 2L, keep.ids = TRUE, ...)
# S3 method for grouped_df
D(x, n = 1, diff = 1, t = NULL, fill = NA, rho = 1, stubs = TRUE,
keep.ids = TRUE, ...)
# S3 method for grouped_df
Dlog(x, n = 1, diff = 1, t = NULL, fill = NA, rho = 1, stubs = TRUE,
keep.ids = TRUE, ...)

## Arguments

x a numeric vector / time series, (time series) matrix, data frame, panel series (plm::pseries), panel data frame (plm::pdata.frame) or grouped data frame (class 'grouped_df'). integer. A vector indicating the number of lags or leads. integer. A vector of integers > 1 indicating the order of differencing / log-differencing. a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x. data.frame method: Same as g, but also allows one- or two-sided formulas i.e. ~ group1 or var1 + var2 ~ group1 + group2. See Examples. same input as g/by, to indicate the time-variable(s). For safe computation of differences on unordered time series and panels. Data Frame method also allows one-sided formula i.e. ~time. grouped_df method supports lazy-evaluation i.e. time (no quotes). data.frame method: Select columns to difference using a function, column names, indices or a logical vector. Default: All numeric variables. Note: cols is ignored if a two-sided formula is passed to by. value to insert when vectors are shifted. Default is NA. logical. TRUE computes log-differences instead. See Details. double. Autocorrelation parameter. Set to a value between 0 and 1 for quasi-differencing. Any numeric value can be supplied. logical. TRUE will rename all differenced columns by adding prefixes "LnDdiff." / "FnDdiff." for differences "LnDlogdiff." / "FnDlogdiff." for log-differences and replacing "D" / "Dlog" with "QD" / "QDlog" for quasi-differences. data.frame / pdata.frame / grouped_df methods: Logical. Drop all panel-identifiers from the output (which includes all variables passed to by or t). Note: For grouped / panel data frames identifiers are dropped, but the 'groups' / 'index' attributes are kept. arguments to be passed to or from other methods.

## Details

By default, fdiff/D/Dlog return x with all columns differenced / log-differenced. Differences are computed as repeat(diff) x[i] - rho*x[i-n], and log-differences as repeat(diff) log(x[i]) - rho*log(x[i-n]). If rho < 1, this becomes quasi- (or partial) differencing, which is a technique suggested by Cochrane and Orcutt (1949) to deal with serial correlation in regression models, where rho is typically estimated by running a regression of the model residuals on the lagged residuals. Setting diff = 2 returns differences of differences etc... and setting n = 2 returns simple differences computed by subtracting twice-lagged x from x. It is also possible to compute forward differences by passing negative n values. n also supports arbitrary vectors of integers (lags), and diff supports positive sequences of integers (differences):

If more than one value is passed to n and/or diff, the data is expanded-wide as follows: If x is an atomic vector or time series, a (time series) matrix is returned with columns ordered first by lag, then by difference. If x is a matrix or data frame, each column is expanded in like manor such that the output has ncol(x)*length(n)*length(diff) columns ordered first by column name, then by lag, then by difference.

With groups/panel-identifiers supplied to g/by, fdiff/D/Dlog efficiently compute panel-differences. If t is left empty, the data needs to be ordered such that all values belonging to a group are consecutive and in the right order. It is not necessary that the groups themselves occur in the right order. If time-variable(s) are supplied to t, the panel is fully identified and differences can be securely computed even if the data is unordered.

fdiff/D/Dlog supports balanced panels and unbalanced panels where various individuals are observed for different time-sequences. For computational details and efficiency considerations see the help page for flag.

It is also possible to compute differences on unordered vectors or irregular time series (thus utilizing t but leaving g/by empty).

The methods applying to plm objects (panel series and panel data frames) automatically utilize the panel-identifiers attached to these objects and thus securely compute fully identified panel-differences. If these objects have > 2 panel-identifiers attached to them, the last identifier is assumed to be the time-variable, and the others are taken as grouping-variables and interacted.

## Value

x differenced diff times using lags n of itself. Quasi and log-differences are toggled by the rho and log arguments or the Dlog operator. Computations can be grouped by g/by and/or ordered by t. See Details and Examples.

## References

Cochrane, D.; Orcutt, G. H. (1949). Application of Least Squares Regression to Relationships Containing Auto-Correlated Error Terms. Journal of the American Statistical Association. 44 (245): 32-61.

Prais, S. J. & Winsten, C. B. (1954). Trend Estimators and Serial Correlation. Cowles Commission Discussion Paper No. 383. Chicago.

## See also

flag/L/F, fgrowth/G, Time Series and Panel Series, Collapse Overview

## Examples

## Simple Time Series: AirPassengers
D(AirPassengers)                      # 1st difference, same as fdiff(AirPassengers)
#>       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
#> 1949   NA    6   14   -3   -8   14   13    0  -12  -17  -15   14
#> 1950   -3   11   15   -6  -10   24   21    0  -12  -25  -19   26
#> 1951    5    5   28  -15    9    6   21    0  -15  -22  -16   20
#> 1952    5    9   13  -12    2   35   12   12  -33  -18  -19   22
#> 1953    2    0   40   -1   -6   14   21    8  -35  -26  -31   21
#> 1954    3  -16   47   -8    7   30   38   -9  -34  -30  -26   26
#> 1955   13   -9   34    2    1   45   49  -17  -35  -38  -37   41
#> 1956    6   -7   40   -4    5   56   39   -8  -50  -49  -35   35
#> 1957    9  -14   55   -8    7   67   43    2  -63  -57  -42   31
#> 1958    4  -22   44  -14   15   72   56   14 -101  -45  -49   27
#> 1959   23  -18   64  -10   24   52   76   11  -96  -56  -45   43
#> 1960   12  -26   28   42   11   63   87  -16  -98  -47  -71   42D(AirPassengers, -1)                  # Forward difference
#>      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#> 1949  -6 -14   3   8 -14 -13   0  12  17  15 -14   3
#> 1950 -11 -15   6  10 -24 -21   0  12  25  19 -26  -5
#> 1951  -5 -28  15  -9  -6 -21   0  15  22  16 -20  -5
#> 1952  -9 -13  12  -2 -35 -12 -12  33  18  19 -22  -2
#> 1953   0 -40   1   6 -14 -21  -8  35  26  31 -21  -3
#> 1954  16 -47   8  -7 -30 -38   9  34  30  26 -26 -13
#> 1955   9 -34  -2  -1 -45 -49  17  35  38  37 -41  -6
#> 1956   7 -40   4  -5 -56 -39   8  50  49  35 -35  -9
#> 1957  14 -55   8  -7 -67 -43  -2  63  57  42 -31  -4
#> 1958  22 -44  14 -15 -72 -56 -14 101  45  49 -27 -23
#> 1959  18 -64  10 -24 -52 -76 -11  96  56  45 -43 -12
#> 1960  26 -28 -42 -11 -63 -87  16  98  47  71 -42  NADlog(AirPassengers)                   # Log-difference
#>               Jan          Feb          Mar          Apr          May
#> 1949           NA  0.052185753  0.112117298 -0.022989518 -0.064021859
#> 1950 -0.025752496  0.091349779  0.112477983 -0.043485112 -0.076961041
#> 1951  0.035091320  0.033901552  0.171148256 -0.088033349  0.053744276
#> 1952  0.029675768  0.051293294  0.069733338 -0.064193158  0.010989122
#> 1953  0.010256500  0.000000000  0.185717146 -0.004246291 -0.025863511
#> 1954  0.014815086 -0.081678031  0.223143551 -0.034635497  0.030371098
#> 1955  0.055215723 -0.037899273  0.136210205  0.007462721  0.003710579
#> 1956  0.021353124 -0.024956732  0.134884268 -0.012698583  0.015848192
#> 1957  0.028987537 -0.045462374  0.167820466 -0.022728251  0.019915310
#> 1958  0.011834458 -0.066894235  0.129592829 -0.039441732  0.042200354
#> 1959  0.066021101 -0.051293294  0.171542423 -0.024938948  0.058840500
#> 1960  0.029199155 -0.064378662  0.069163360  0.095527123  0.023580943
#>               Jun          Jul          Aug          Sep          Oct
#> 1949  0.109484233  0.091937495  0.000000000 -0.084557388 -0.133531393
#> 1950  0.175632569  0.131852131  0.000000000 -0.073203404 -0.172245905
#> 1951  0.034289073  0.111521274  0.000000000 -0.078369067 -0.127339422
#> 1952  0.175008910  0.053584246  0.050858417 -0.146603474 -0.090060824
#> 1953  0.059339440  0.082887660  0.029852963 -0.137741925 -0.116202008
#> 1954  0.120627988  0.134477914 -0.030254408 -0.123344547 -0.123106058
#> 1955  0.154150680  0.144581229 -0.047829088 -0.106321592 -0.129875081
#> 1956  0.162204415  0.099191796 -0.019560526 -0.131769278 -0.148532688
#> 1957  0.172887525  0.097032092  0.004291852 -0.144914380 -0.152090098
#> 1958  0.180943197  0.121098097  0.028114301 -0.223143551 -0.118092489
#> 1959  0.116724274  0.149296301  0.019874186 -0.188422419 -0.128913869
#> 1960  0.125287761  0.150673346 -0.026060107 -0.176398538 -0.097083405
#>               Nov          Dec
#> 1949 -0.134732594  0.126293725
#> 1950 -0.154150680  0.205443974
#> 1951 -0.103989714  0.128381167
#> 1952 -0.104778951  0.120363682
#> 1953 -0.158901283  0.110348057
#> 1954 -0.120516025  0.120516025
#> 1955 -0.145067965  0.159560973
#> 1956 -0.121466281  0.121466281
#> 1957 -0.129013003  0.096799383
#> 1958 -0.146750091  0.083510633
#> 1959 -0.117168974  0.112242855
#> 1960 -0.167251304  0.102278849D(AirPassengers, 1, 2)                # Second difference
#>       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
#> 1949   NA   NA    8  -17   -5   22   -1  -13  -12   -5    2   29
#> 1950  -17   14    4  -21   -4   34   -3  -21  -12  -13    6   45
#> 1951  -21    0   23  -43   24   -3   15  -21  -15   -7    6   36
#> 1952  -15    4    4  -25   14   33  -23    0  -45   15   -1   41
#> 1953  -20   -2   40  -41   -5   20    7  -13  -43    9   -5   52
#> 1954  -18  -19   63  -55   15   23    8  -47  -25    4    4   52
#> 1955  -13  -22   43  -32   -1   44    4  -66  -18   -3    1   78
#> 1956  -35  -13   47  -44    9   51  -17  -47  -42    1   14   70
#> 1957  -26  -23   69  -63   15   60  -24  -41  -65    6   15   73
#> 1958  -27  -26   66  -58   29   57  -16  -42 -115   56   -4   76
#> 1959   -4  -41   82  -74   34   28   24  -65 -107   40   11   88
#> 1960  -31  -38   54   14  -31   52   24 -103  -82   51  -24  113Dlog(AirPassengers, 1, 2)             # Second log-difference
#>                Jan           Feb           Mar           Apr           May
#> 1949            NA            NA  0.0599315450 -0.1351068163 -0.0410323405
#> 1950 -0.1520462214  0.1171022747  0.0211282048 -0.1559630954 -0.0334759292
#> 1951 -0.1703526544 -0.0011897681  0.1372467045 -0.2591816057  0.1417776255
#> 1952 -0.0987053985  0.0216175262  0.0184400436 -0.1339264957  0.0751822792
#> 1953 -0.1101071821 -0.0102565002  0.1857171458 -0.1899634367 -0.0216172197
#> 1954 -0.0955329714 -0.0964931168  0.3048215823 -0.2577790480  0.0650065945
#> 1955 -0.0653003019 -0.0931149952  0.1741094774 -0.1287474836 -0.0037521418
#> 1956 -0.1382078481 -0.0463098564  0.1598409997 -0.1475828510  0.0285467756
#> 1957 -0.0924787442 -0.0744499110  0.2132828402 -0.1905487172  0.0426435608
#> 1958 -0.0849649257 -0.0787286925  0.1964870639 -0.1690345611  0.0816420865
#> 1959 -0.0174895318 -0.1173143955  0.2228357169 -0.1964813709  0.0837794484
#> 1960 -0.0830437006 -0.0935778165  0.1335420218  0.0263637631 -0.0719461805
#>                Jun           Jul           Aug           Sep           Oct
#> 1949  0.1735060916 -0.0175467375 -0.0919374953 -0.0845573880 -0.0489740046
#> 1950  0.2525936098 -0.0437804375 -0.1318521311 -0.0732034040 -0.0990425008
#> 1951 -0.0194552025  0.0772322010 -0.1115212744 -0.0783690671 -0.0489703553
#> 1952  0.1640197884 -0.1214246638 -0.0027258289 -0.1974618914  0.0565426503
#> 1953  0.0852029504  0.0235482200 -0.0530346967 -0.1675948883  0.0215399175
#> 1954  0.0902568899  0.0138499264 -0.1647323226 -0.0930901390  0.0002384892
#> 1955  0.1504401004 -0.0095694510 -0.1924103165 -0.0584925044 -0.0235534893
#> 1956  0.1463562224 -0.0630126191 -0.1187523214 -0.1122087518 -0.0167634099
#> 1957  0.1529722149 -0.0758554330 -0.0927402395 -0.1492062318 -0.0071757183
#> 1958  0.1387428423 -0.0598451001 -0.0929837952 -0.2512578528  0.1050510618
#> 1959  0.0578837743  0.0325720271 -0.1294221152 -0.2082966053  0.0595085504
#> 1960  0.1017068187  0.0253855845 -0.1767334525 -0.1503384318  0.0793151339
#>                Nov           Dec
#> 1949 -0.0012012013  0.2610263193
#> 1950  0.0180952250  0.3595946540
#> 1951  0.0233497089  0.2323708802
#> 1952 -0.0147181273  0.2251426335
#> 1953 -0.0426992749  0.2692493398
#> 1954  0.0025900336  0.2410320490
#> 1955 -0.0151928838  0.3046289378
#> 1956  0.0270664065  0.2429325621
#> 1957  0.0230770947  0.2258123867
#> 1958 -0.0286576015  0.2302607239
#> 1959  0.0117448950  0.2294118289
#> 1960 -0.0701678993  0.2695301530D(AirPassengers, 12)                  # Seasonal difference (data is monthly)
#>      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#> 1949  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
#> 1950   3   8   9   6   4  14  22  22  22  14  10  22
#> 1951  30  24  37  28  47  29  29  29  26  29  32  26
#> 1952  26  30  15  18  11  40  31  43  25  29  26  28
#> 1953  25  16  43  54  46  25  34  30  28  20   8   7
#> 1954   8  -8  -1  -8   5  21  38  21  22  18  23  28
#> 1955  38  45  32  42  36  51  62  54  53  45  34  49
#> 1956  42  44  50  44  48  59  49  58  43  32  34  28
#> 1957  31  24  39  35  37  48  52  62  49  41  34  30
#> 1958  25  17   6   0   8  13  26  38   0  12   5   1
#> 1959  20  24  44  48  57  37  57  54  59  48  52  68
#> 1960  57  49  13  65  52  63  74  47  45  54  28  27D(AirPassengers,                      # Quasi-difference, see a better example below
rho = pwcor(AirPassengers, L(AirPassengers)))
#>              Jan         Feb         Mar         Apr         May         Jun
#> 1949          NA  10.4581994  18.6970315   2.2543065  -2.8651096  18.8164476
#> 1950   1.6970315  15.5776155  20.0154743  -0.3874454  -4.6262775  28.9756690
#> 1951  10.5727493  10.7717760  33.9708028  -7.9146474  15.4882724  12.8465205
#> 1952  11.6076884  15.8067152  20.1649634  -4.3175671   9.2047687  42.2843794
#> 1953   9.7222383   7.8018490  47.8018490   8.3940631   3.3542577  23.1154256
#> 1954  11.0008757  -7.8797082  54.4834062   1.3542577  16.0358149  39.3144524
#> 1955  22.1154256   0.6328952  43.2746470  12.6280290  11.7076397  55.7474450
#> 1956  17.0658878   4.3047200  51.0260825   8.6182966  17.4590752  68.6581019
#> 1957  21.1804377  -1.4613141  66.9814109   6.1707053  20.8522625  81.1308999
#> 1958  17.3745983  -8.4661803  56.6581019   0.4095374  28.8522625  86.4493428
#> 1959  36.4144036  -3.6700733  77.6134304   6.1609729  39.7629194  68.7182478
#> 1960  28.1211675  -9.4011682  43.5638926  58.6784425  29.3502672  81.7881261
#>              Jul         Aug         Sep         Oct         Nov         Dec
#> 1949  18.3737225   5.8911921  -6.1088079 -11.5864721 -10.2631631  18.1397566
#> 1950  26.9309974   6.7669098  -5.2330902 -18.7107544 -13.7058882  30.5378101
#> 1951  28.0853526   7.9212650  -7.0787350 -14.6758152  -9.5515330  25.8115814
#> 1952  20.6775667  21.1552309 -23.3671048  -9.6806814 -11.3971778  28.8465205
#> 1953  30.6727005  18.5086129 -24.1729443 -16.5661316 -22.6010707  28.1649634
#> 1954  48.5086129   3.0212163 -22.3370319 -19.6904138 -16.8845744  34.0804864
#> 1955  61.5386859  -2.5108519 -21.1875429 -25.5807302 -26.0933336  50.4338684
#> 1956  53.8872016   8.4396104 -33.8788325 -34.8691001 -22.8195623  45.7872504
#> 1957  59.7978585  20.5094887 -44.4109006 -40.9186378 -28.1875429  43.1406323
#> 1958  73.3153281  33.5444278 -80.8982973 -28.9186378 -34.7098786  39.3396591
#> 1959  94.7881261  32.8133329 -73.7488083 -37.5701220 -28.7992218  57.4095374
#> 1960 108.2958633   8.7589289 -73.8779567 -26.7788812 -52.6497328  57.5240873
head(D(AirPassengers, -2:2, 1:3))     # Sequence of leaded/lagged and iterated differences
#>      F2D1 F2D2 F2D3 FD1 FD2 FD3  -- D1  D2  D3 L2D1 L2D2 L2D3
#> [1,]  -20  -31  -69  -6   8  25 112 NA  NA  NA   NA   NA   NA
#> [2,]  -11   -5  -12 -14 -17 -12 118  6  NA  NA   NA   NA   NA
#> [3,]   11   38   77   3  -5 -27 132 14   8  NA   20   NA   NA
#> [4,]   -6    7   49   8  22  23 129 -3 -17 -25   11   NA   NA
#> [5,]  -27  -39  -19 -14  -1  12 121 -8  -5  12  -11  -31   NA
#> [6,]  -13  -42  -70 -13 -13  -1 135 14  22  27    6   -5   NA
# let's do some visual analysis
plot(AirPassengers)                   # Plot the series - seasonal pattern is evident
plot(stl(AirPassengers, "periodic"))  # Seasonal decomposition
plot(D(AirPassengers,c(1,12),1:2))    # Plotting ordinary and seasonal first and second differences
plot(stl(window(D(AirPassengers,12),  # Taking seasonal differences removes most seasonal variation
1950), "periodic"))

## Time Series Matrix of 4 EU Stock Market Indicators, recorded 260 days per year
plot(D(EuStockMarkets, c(0, 260)))                      # Plot series and annual differnces
mod <- lm(DAX ~., L(EuStockMarkets, c(0, 260)))         # Regressing the DAX on its annual lag
summary(mod)                                            # and the levels and annual lags others
#>
#> Call:
#> lm(formula = DAX ~ ., data = L(EuStockMarkets, c(0, 260)))
#>
#> Residuals:
#>     Min      1Q  Median      3Q     Max
#> -224.33  -57.02  -12.40   51.51  359.96
#>
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -123.26123   59.74149  -2.063   0.0393 *
#> L260.DAX      -0.02126    0.02151  -0.988   0.3232
#> SMI            0.37415    0.01356  27.589   <2e-16 ***
#> L260.SMI       0.28186    0.01901  14.826   <2e-16 ***
#> CAC            0.52973    0.01544  34.305   <2e-16 ***
#> L260.CAC      -0.23401    0.02145 -10.911   <2e-16 ***
#> FTSE          -0.03944    0.01780  -2.215   0.0269 *
#> L260.FTSE      0.02888    0.02182   1.324   0.1858
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 84.02 on 1592 degrees of freedom
#>   (260 observations deleted due to missingness)
#> Multiple R-squared:  0.9943,	Adjusted R-squared:  0.9942
#> F-statistic: 3.94e+04 on 7 and 1592 DF,  p-value: < 2.2e-16
#> r <- residuals(mod)                                     # Obtain residuals
pwcor(r, L(r))                                          # Residual Autocorrelation
#> [1] .97fFtest(r, L(r))                                         # F-test of residual autocorrelation
#>     R-Sq.       DF1       DF2   F-Stat.   P-value
#>     0.937         1      1597 23690.699     0.000                                                         # (better use lmtest::bgtest)
modCO <- lm(QD1.DAX ~., D(L(EuStockMarkets, c(0, 260)), # Cochrane-Orcutt (1949) estimation
rho = pwcor(r, L(r))))
summary(modCO)
#>
#> Call:
#> lm(formula = QD1.DAX ~ ., data = D(L(EuStockMarkets, c(0, 260)),
#>     rho = pwcor(r, L(r))))
#>
#> Residuals:
#>     Min      1Q  Median      3Q     Max
#> -87.131  -9.079  -0.439   9.228 119.993
#>
#> Coefficients:
#>                 Estimate Std. Error t value Pr(>|t|)
#> (Intercept)   -17.979391   2.094867  -8.583   <2e-16 ***
#> QD1.L260.DAX    0.048116   0.034403   1.399    0.162
#> QD1.SMI         0.343808   0.013902  24.731   <2e-16 ***
#> QD1.L260.SMI    0.014331   0.022530   0.636    0.525
#> QD1.CAC         0.459655   0.024406  18.834   <2e-16 ***
#> QD1.L260.CAC   -0.031068   0.030598  -1.015    0.310
#> QD1.FTSE        0.220516   0.020682  10.662   <2e-16 ***
#> QD1.L260.FTSE   0.007577   0.025948   0.292    0.770
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 19.06 on 1591 degrees of freedom
#>   (261 observations deleted due to missingness)
#> Multiple R-squared:  0.8582,	Adjusted R-squared:  0.8576
#> F-statistic:  1376 on 7 and 1591 DF,  p-value: < 2.2e-16
#> rCO <- residuals(modCO)
fFtest(rCO, L(rCO))                                     # No more autocorrelation
#>    R-Sq.      DF1      DF2  F-Stat.  P-value
#>    0.001        1     1596    2.326    0.127
## World Development Panel Data
head(fdiff(num_vars(wlddev), 1, 1,                      # Computes differences of numeric variables
wlddev$country, wlddev$year))              # fdiff requires external inputs..
#>   year decade PCGDP LIFEEX GINI        ODA
#> 1   NA     NA    NA     NA   NA         NA
#> 2    1      0    NA  0.450   NA  118910000
#> 3    1      0    NA  0.443   NA -118470000
#> 4    1      0    NA  0.439   NA  121570000
#> 5    1      0    NA  0.436   NA   66030000
#> 6    1      0    NA  0.435   NA   67770000head(D(wlddev, 1, 1, ~country, ~year))                  # Differences of numeric variables
#>       country year D1.decade D1.PCGDP D1.LIFEEX D1.GINI     D1.ODA
#> 1 Afghanistan 1960        NA       NA        NA      NA         NA
#> 2 Afghanistan 1961         0       NA     0.450      NA  118910000
#> 3 Afghanistan 1962         0       NA     0.443      NA -118470000
#> 4 Afghanistan 1963         0       NA     0.439      NA  121570000
#> 5 Afghanistan 1964         0       NA     0.436      NA   66030000
#> 6 Afghanistan 1965         0       NA     0.435      NA   67770000head(D(wlddev, 1, 1, ~country))                         # Without t: Works because data is ordered
#> Panel-difference computed without timevar: Assuming ordered data#>       country D1.year D1.decade D1.PCGDP D1.LIFEEX D1.GINI     D1.ODA
#> 1 Afghanistan      NA        NA       NA        NA      NA         NA
#> 2 Afghanistan       1         0       NA     0.450      NA  118910000
#> 3 Afghanistan       1         0       NA     0.443      NA -118470000
#> 4 Afghanistan       1         0       NA     0.439      NA  121570000
#> 5 Afghanistan       1         0       NA     0.436      NA   66030000
#> 6 Afghanistan       1         0       NA     0.435      NA   67770000head(D(wlddev, 1, 1, PCGDP + LIFEEX ~ country, ~year))  # Difference of GDP & Life Expectancy
#>       country year D1.PCGDP D1.LIFEEX
#> 1 Afghanistan 1960       NA        NA
#> 2 Afghanistan 1961       NA     0.450
#> 3 Afghanistan 1962       NA     0.443
#> 4 Afghanistan 1963       NA     0.439
#> 5 Afghanistan 1964       NA     0.436
#> 6 Afghanistan 1965       NA     0.435head(D(wlddev, 0:1, 1, ~ country, ~year, cols = 9:10))  # Same, also retaining original series
#>       country year PCGDP D1.PCGDP LIFEEX D1.LIFEEX
#> 1 Afghanistan 1960    NA       NA 32.292        NA
#> 2 Afghanistan 1961    NA       NA 32.742     0.450
#> 3 Afghanistan 1962    NA       NA 33.185     0.443
#> 4 Afghanistan 1963    NA       NA 33.624     0.439
#> 5 Afghanistan 1964    NA       NA 34.060     0.436
#> 6 Afghanistan 1965    NA       NA 34.495     0.435head(D(wlddev, 0:1, 1, ~ country, ~year, 9:10,          # Dropping id columns
keep.ids = FALSE))
#>   PCGDP D1.PCGDP LIFEEX D1.LIFEEX
#> 1    NA       NA 32.292        NA
#> 2    NA       NA 32.742     0.450
#> 3    NA       NA 33.185     0.443
#> 4    NA       NA 33.624     0.439
#> 5    NA       NA 34.060     0.436
#> 6    NA       NA 34.495     0.435
# Dynamic Panel Data Models:
summary(lm(D(PCGDP,1,1,iso3c,year) ~                    # Diff. GDP regressed on it's lagged level
L(PCGDP,1,iso3c,year) +                    # and the difference of Life Expanctancy
D(LIFEEX,1,1,iso3c,year), data = wlddev))
#>
#> Call:
#> lm(formula = D(PCGDP, 1, 1, iso3c, year) ~ L(PCGDP, 1, iso3c,
#>     year) + D(LIFEEX, 1, 1, iso3c, year), data = wlddev)
#>
#> Residuals:
#>      Min       1Q   Median       3Q      Max
#> -16877.4   -103.2    -52.4    106.0  12606.3
#>
#> Coefficients:
#>                                Estimate Std. Error t value Pr(>|t|)
#> (Intercept)                   8.918e+01  1.410e+01   6.325 2.67e-10 ***
#> L(PCGDP, 1, iso3c, year)      8.943e-03  6.035e-04  14.820  < 2e-16 ***
#> D(LIFEEX, 1, 1, iso3c, year) -2.559e+01  2.528e+01  -1.012    0.311
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 830.5 on 8181 degrees of freedom
#>   (4560 observations deleted due to missingness)
#> Multiple R-squared:  0.02721,	Adjusted R-squared:  0.02698
#> F-statistic: 114.4 on 2 and 8181 DF,  p-value: < 2.2e-16
#>
g = qF(wlddev$country) # Omitting t and precomputing g allows for summary(lm(D(PCGDP,1,1,g) ~ L(PCGDP,1,g) + # a bit more parsimonious specification D(LIFEEX,1,1,g), wlddev)) #> Panel-difference computed without timevar: Assuming ordered data#> Panel-lag computed without timevar: Assuming ordered data#> Panel-difference computed without timevar: Assuming ordered data#> #> Call: #> lm(formula = D(PCGDP, 1, 1, g) ~ L(PCGDP, 1, g) + D(LIFEEX, 1, #> 1, g), data = wlddev) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16877.4 -103.2 -52.4 106.0 12606.3 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 8.918e+01 1.410e+01 6.325 2.67e-10 *** #> L(PCGDP, 1, g) 8.943e-03 6.035e-04 14.820 < 2e-16 *** #> D(LIFEEX, 1, 1, g) -2.559e+01 2.528e+01 -1.012 0.311 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 830.5 on 8181 degrees of freedom #> (4560 observations deleted due to missingness) #> Multiple R-squared: 0.02721, Adjusted R-squared: 0.02698 #> F-statistic: 114.4 on 2 and 8181 DF, p-value: < 2.2e-16 #> summary(lm(D1.PCGDP ~., # Now adding level and lagged level of L(D(wlddev,0:1,1, ~ country, ~year,9:10),0:1, # LIFEEX and lagged differences rates ~ country, ~year, keep.ids = FALSE)[-1])) #> #> Call: #> lm(formula = D1.PCGDP ~ ., data = L(D(wlddev, 0:1, 1, ~country, #> ~year, 9:10), 0:1, ~country, ~year, keep.ids = FALSE)[-1]) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16621.0 -100.0 -17.2 86.2 11935.3 #> #> Coefficients: (1 not defined because of singularities) #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -3.215e+02 6.337e+01 -5.073 4e-07 *** #> L1.PCGDP 2.507e-03 7.106e-04 3.529 0.00042 *** #> L1.D1.PCGDP 3.155e-01 1.070e-02 29.483 < 2e-16 *** #> LIFEEX -1.936e+00 3.825e+01 -0.051 0.95962 #> L1.LIFEEX 8.345e+00 3.814e+01 0.219 0.82683 #> D1.LIFEEX NA NA NA NA #> L1.D1.LIFEEX 1.667e+00 3.771e+01 0.044 0.96475 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 791.3 on 7988 degrees of freedom #> (4750 observations deleted due to missingness) #> Multiple R-squared: 0.1281, Adjusted R-squared: 0.1276 #> F-statistic: 234.8 on 5 and 7988 DF, p-value: < 2.2e-16 #> ## Using plm can make things easier, but avoid attaching or 'with' calls: pwlddev <- plm::pdata.frame(wlddev, index = c("country","year")) head(D(pwlddev, 0:1, 1, 9:10)) # Again differences of LIFEEX and PCGDP #> country year PCGDP D1.PCGDP LIFEEX D1.LIFEEX #> Afghanistan-1960 Afghanistan 1960 NA NA 32.292 NA #> Afghanistan-1961 Afghanistan 1961 NA NA 32.742 0.450 #> Afghanistan-1962 Afghanistan 1962 NA NA 33.185 0.443 #> Afghanistan-1963 Afghanistan 1963 NA NA 33.624 0.439 #> Afghanistan-1964 Afghanistan 1964 NA NA 34.060 0.436 #> Afghanistan-1965 Afghanistan 1965 NA NA 34.495 0.435PCGDP <- pwlddev$PCGDP                                  # A panel-Series of GDP per Capita
head(D(PCGDP))                                          # Differencing the panel series
#> Afghanistan-1960 Afghanistan-1961 Afghanistan-1962 Afghanistan-1963
#>               NA               NA               NA               NA
#> Afghanistan-1964 Afghanistan-1965
#>               NA               NA summary(lm(D1.PCGDP ~.,                                 # Running the dynamic model again ->
data = L(D(pwlddev,0:1,1,9:10),0:1,          # code becomes a bit simpler
keep.ids = FALSE)[-1]))
#>
#> Call:
#> lm(formula = D1.PCGDP ~ ., data = L(D(pwlddev, 0:1, 1, 9:10),
#>     0:1, keep.ids = FALSE)[-1])
#>
#> Residuals:
#>      Min       1Q   Median       3Q      Max
#> -16621.0   -100.0    -17.2     86.2  11935.3
#>
#> Coefficients: (1 not defined because of singularities)
#>                Estimate Std. Error t value Pr(>|t|)
#> (Intercept)  -3.215e+02  6.337e+01  -5.073    4e-07 ***
#> L1.PCGDP      2.507e-03  7.106e-04   3.529  0.00042 ***
#> L1.D1.PCGDP   3.155e-01  1.070e-02  29.483  < 2e-16 ***
#> LIFEEX       -1.936e+00  3.825e+01  -0.051  0.95962
#> L1.LIFEEX     8.345e+00  3.814e+01   0.219  0.82683
#> D1.LIFEEX            NA         NA      NA       NA
#> L1.D1.LIFEEX  1.667e+00  3.771e+01   0.044  0.96475
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 791.3 on 7988 degrees of freedom
#>   (4750 observations deleted due to missingness)
#> Multiple R-squared:  0.1281,	Adjusted R-squared:  0.1276
#> F-statistic: 234.8 on 5 and 7988 DF,  p-value: < 2.2e-16
#>
# One could be tempted to also do something like this, but THIS DOES NOT WORK!!:
# -> a pseries is only created when subsetting the pdata.frame using $or [[ summary(lm(D(PCGDP) ~ L(D(PCGDP,0:1)) + L(D(LIFEEX,0:1),0:1), pwlddev)) #> #> Call: #> lm(formula = D(PCGDP) ~ L(D(PCGDP, 0:1)) + L(D(LIFEEX, 0:1), #> 0:1), data = pwlddev) #> #> Residuals: #> Min 1Q Median 3Q Max #> -16621.0 -100.0 -17.2 86.2 11935.3 #> #> Coefficients: (1 not defined because of singularities) #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -3.215e+02 6.337e+01 -5.073 4e-07 *** #> L(D(PCGDP, 0:1))L1.-- 2.507e-03 7.106e-04 3.529 0.00042 *** #> L(D(PCGDP, 0:1))L1.D1 3.155e-01 1.070e-02 29.483 < 2e-16 *** #> L(D(LIFEEX, 0:1), 0:1)-- -1.936e+00 3.825e+01 -0.051 0.95962 #> L(D(LIFEEX, 0:1), 0:1)L1.-- 8.345e+00 3.814e+01 0.219 0.82683 #> L(D(LIFEEX, 0:1), 0:1)D1 NA NA NA NA #> L(D(LIFEEX, 0:1), 0:1)L1.D1 1.667e+00 3.771e+01 0.044 0.96475 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 791.3 on 7988 degrees of freedom #> (4750 observations deleted due to missingness) #> Multiple R-squared: 0.1281, Adjusted R-squared: 0.1276 #> F-statistic: 234.8 on 5 and 7988 DF, p-value: < 2.2e-16 #> # To make it work, one needs to create pseries LIFEEX <- pwlddev$LIFEEX
summary(lm(D(PCGDP) ~ L(D(PCGDP,0:1)) + L(D(LIFEEX,0:1),0:1))) # THIS WORKS !
#>
#> Call:
#> lm(formula = D(PCGDP) ~ L(D(PCGDP, 0:1)) + L(D(LIFEEX, 0:1),
#>     0:1))
#>
#> Residuals:
#>      Min       1Q   Median       3Q      Max
#> -16621.0   -100.0    -17.2     86.2  11935.3
#>
#> Coefficients: (1 not defined because of singularities)
#>                               Estimate Std. Error t value Pr(>|t|)
#> (Intercept)                 -3.215e+02  6.337e+01  -5.073    4e-07 ***
#> L(D(PCGDP, 0:1))L1.--        2.507e-03  7.106e-04   3.529  0.00042 ***
#> L(D(PCGDP, 0:1))L1.D1        3.155e-01  1.070e-02  29.483  < 2e-16 ***
#> L(D(LIFEEX, 0:1), 0:1)--    -1.936e+00  3.825e+01  -0.051  0.95962
#> L(D(LIFEEX, 0:1), 0:1)L1.--  8.345e+00  3.814e+01   0.219  0.82683
#> L(D(LIFEEX, 0:1), 0:1)D1            NA         NA      NA       NA
#> L(D(LIFEEX, 0:1), 0:1)L1.D1  1.667e+00  3.771e+01   0.044  0.96475
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 791.3 on 7988 degrees of freedom
#>   (4750 observations deleted due to missingness)
#> Multiple R-squared:  0.1281,	Adjusted R-squared:  0.1276
#> F-statistic: 234.8 on 5 and 7988 DF,  p-value: < 2.2e-16
#>
## Using dplyr:
library(dplyr)
wlddev %>% group_by(country) %>%
select(PCGDP,LIFEEX) %>% fdiff(0:1,1:2)       # Adding a first and second difference
#> Adding missing grouping variables: country#> Panel-difference computed without timevar: Assuming ordered data#> # A tibble: 12,744 x 7
#> # Groups:   country [216]
#>    country     PCGDP D1.PCGDP D2.PCGDP LIFEEX D1.LIFEEX D2.LIFEEX
#>  * <chr>       <dbl>    <dbl>    <dbl>  <dbl>     <dbl>     <dbl>
#>  1 Afghanistan    NA       NA       NA   32.3    NA      NA
#>  2 Afghanistan    NA       NA       NA   32.7     0.450  NA
#>  3 Afghanistan    NA       NA       NA   33.2     0.443  -0.00700
#>  4 Afghanistan    NA       NA       NA   33.6     0.439  -0.004
#>  5 Afghanistan    NA       NA       NA   34.1     0.436  -0.003
#>  6 Afghanistan    NA       NA       NA   34.5     0.435  -0.001
#>  7 Afghanistan    NA       NA       NA   34.9     0.433  -0.00200
#>  8 Afghanistan    NA       NA       NA   35.4     0.433   0
#>  9 Afghanistan    NA       NA       NA   35.8     0.435   0.002
#> 10 Afghanistan    NA       NA       NA   36.2     0.438   0.003
#> # ... with 12,734 more rowswlddev %>% group_by(country) %>%
select(year,PCGDP,LIFEEX) %>% D(0:1,1:2,year) # Also using t (safer)
#> Adding missing grouping variables: country#> # A tibble: 12,744 x 8
#> # Groups:   country [216]
#>    country      year PCGDP D1.PCGDP D2.PCGDP LIFEEX D1.LIFEEX D2.LIFEEX
#>  * <chr>       <int> <dbl>    <dbl>    <dbl>  <dbl>     <dbl>     <dbl>
#>  1 Afghanistan  1960    NA       NA       NA   32.3    NA      NA
#>  2 Afghanistan  1961    NA       NA       NA   32.7     0.450  NA
#>  3 Afghanistan  1962    NA       NA       NA   33.2     0.443  -0.00700
#>  4 Afghanistan  1963    NA       NA       NA   33.6     0.439  -0.004
#>  5 Afghanistan  1964    NA       NA       NA   34.1     0.436  -0.003
#>  6 Afghanistan  1965    NA       NA       NA   34.5     0.435  -0.001
#>  7 Afghanistan  1966    NA       NA       NA   34.9     0.433  -0.00200
#>  8 Afghanistan  1967    NA       NA       NA   35.4     0.433   0
#>  9 Afghanistan  1968    NA       NA       NA   35.8     0.435   0.002
#> 10 Afghanistan  1969    NA       NA       NA   36.2     0.438   0.003
#> # ... with 12,734 more rowswlddev %>% group_by(country) %>%                           # Dropping id's
select(year,PCGDP,LIFEEX) %>% D(0:1,1:2,year, keep.ids = FALSE)
#> Adding missing grouping variables: country#> # A tibble: 12,744 x 6
#> # Groups:   country [216]
#>    PCGDP D1.PCGDP D2.PCGDP LIFEEX D1.LIFEEX D2.LIFEEX
#>  * <dbl>    <dbl>    <dbl>  <dbl>     <dbl>     <dbl>
#>  1    NA       NA       NA   32.3    NA      NA
#>  2    NA       NA       NA   32.7     0.450  NA
#>  3    NA       NA       NA   33.2     0.443  -0.00700
#>  4    NA       NA       NA   33.6     0.439  -0.004
#>  5    NA       NA       NA   34.1     0.436  -0.003
#>  6    NA       NA       NA   34.5     0.435  -0.001
#>  7    NA       NA       NA   34.9     0.433  -0.00200
#>  8    NA       NA       NA   35.4     0.433   0
#>  9    NA       NA       NA   35.8     0.435   0.002
#> 10    NA       NA       NA   36.2     0.438   0.003
#> # ... with 12,734 more rows