This short post notifies you of the CRAN release of a new R package, dfms, to efficiently estimate dynamic factor models in R using the Expectation Maximization (EM) algorithm and Kalman Filtering. Estimation can be done in 3 different ways following:
Doz, C., Giannone, D., & Reichlin, L. (2011). A two-step estimator for large approximate dynamic factor models based on Kalman filtering. Journal of Econometrics, 164(1), 188-205. <doi.org/10.1016/j.jeconom.2011.02.012>
Doz, C., Giannone, D., & Reichlin, L. (2012). A quasi-maximum likelihood approach for large, approximate dynamic factor models. Review of economics and statistics, 94(4), 1014-1024. <doi.org/10.1162/REST_a_00225>
Banbura, M., & Modugno, M. (2014). Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data. Journal of Applied Econometrics, 29(1), 133-160. <doi.org/10.1002/jae.2306>
The general model is
\[\textbf{x}_t = \textbf{C} \textbf{f}_t + \textbf{e}_t \ \sim\ N(\textbf{0}, \textbf{R})\] \[\textbf{f}_t = \sum_{j=1}^p \textbf{A}_j \textbf{f}_{t-j} + \textbf{u}_t \ \sim\ N(\textbf{0}, \textbf{Q})\]
where the first equation is called the measurement or observation equation, the second equation is called transition, state or process equation, and
- \(\textbf{x}_t\) is a \(n \times 1\) vector of observed series at time \(t\)
- \(\textbf{f}_t\) is a \(r \times 1\) vector of unobserved factors at time \(t\)
- \(\textbf{C}\) is a \(n \times r\) measurement (observation) matrix
- \(\textbf{A}_j\) is a \(r \times r\) state transition matrix at lag \(j\)
- \(\textbf{Q}\) is a \(r \times r\) state covariance matrix
- \(\textbf{R}\) is a \(n \times n\) measurement (observation) covariance matrix and assumed to be diagonal.
Estimation is done by finding initial values of the model matrices through PCA, and using those to run a Kalman Filter and Smoother to obtain an estimate of \(\textbf{f}_t\). In EM estimation the system matrices are then updated with the estimates from the Kalman Filter and Smoother, and the data is filtered and smoothed again until convergence of the Kalman Filter log-likelihood.
Estimation with dfms is very efficient, powered by RcppArmadillo and collapse, and supports arbitrary patterns of missing data following Banbura and Modugno (2014). A comprehensive set of methods allows for easy model interpretation and forecasting. The present release v0.1.3 does not support advanced DFM estimation features, such as accounting for serial correlation in \(\textbf{e}_t\) or \(\textbf{u}_t\), series of mixed frequency in \(\textbf{x}_t\), time-varying system matrices \(\textbf{C}_t\) and \(\textbf{A}_t\) or structural breaks in the estimation. Some of these features may be added in the future.
To learn more about dfms, check out the website, and in particular the introductory vignette, which provides a short walk-through of the package.