Preamable: Predictions using dynamic modelling
Machine Learning and Neural Networks are not the only way to do data science or AI. There are other techniques to explore , for example, from quantitative economics. Apart from Game Theory, dynamic modelling could be suitable to many prediction problems, specially the ones with temporal datasets. Here is one example technique from forgotten Norwegian Nobel prize winning economist/data scientist Trygve Haavelmo. Advantage of such models are they can be explained, not fully black box. Make no mistake, for very large system with large datasets, this is not that trivial to implement, GPUs are well suited. In this post we give one hands on exercise.
Summary
Econometrics aims at estimating observables in the economy and their inter-dependencies and testing the estimates against the economic reality. A quantitative approach to express these inter-dependencies appear as simultaneous equations, an i.e. system of linear equations, this is a mathematical structure of economic relationships that were made possible with the pioneering work of Nobel prize winning economist Trygve Haavelmo [1,2]. This approach and its dynamic variants are now used routinely in dynamic modelling of econometric systems. From a computational perspective, R-project provides efficient and very rich computational environment and the large set of extensions for econometrics in general [3].
Dynamic Modelling
The simplest relationship that can be constructed with two arbitrary economic variables, or instruments, $X(t)$ and $Y(t)$ is shown by Haavelmo [1]. For example, these variables could be unemployment rate and Gross Domestic Product (GDP), as in Okun's law. Hence, the simplest bi-variate simultaneous system of equations looks as follows,
$X(t) =a Y(t) + \epsilon_{x}(t)$,
$Y(t) =b X(t) + \epsilon_{y}(t)$,
where $a$ and $b$ are constant coefficients. Where, $\epsilon_{x}$ and $\epsilon_{y}$, appear as non-deterministic disturbances and are not observed in the modelled economic system. Disturbances are usually expressed as random variables drawn from the normal distribution. At this stage, one can follow two approaches in doing economic scenario analysis for forecasting. We can aim at finding coefficients $a$ and $b$ using economic data via dynamic regression. If $a$ and $b$ coefficients are known, we may want to study effect of different disturbances over time.
Dynamic Haavelmo Model
A model of propensity to consume in economic system is shown by Haavelmo [3] based on his analysis of US economic conditions between 1929-1949. His analysis leads to following simultaneous system of equations,
$c(t)= \alpha y(t) + \beta + u(t)$,
$r(t)= \mu (c(t)+x(t)) +\nu + w(t)$,
$y(t)= c(t)+x(t)-r(t)$,
where $\alpha,\beta,\mu,\nu$ are constants and $u(t)$ and $w(t)$ are disturbances. Economic variables have the following meanings,
c(t) : personal consumption expenditures,
y(t) : personal disposable income,
r(t) : gross business savings,
x(t) : gross investment.
However, this model considered to be static while all the relationships are given at the same time point. Zellner-Palm [5] provided a dynamic version of the Haavelmo's model. Here we write down a version of it,
$c(t)= \alpha Dy(t) + \beta + u(t)$,
$r(t)= \mu D(c(t)+x(t)) +\nu + w(t)$,
$y(t)= c(t)+x(t)-r(t)$.
where difference operator means, $Dy(t) = y(t)-y(t-1)$.
Krokozhia Case Study
DataKrokozhia is a fictional country depicted in Steven Spielberg's movie The Terminal. Let's generate a fictional data for our dynamic Haavelmo model's economic instruments for this country from 1950 to present in R,
set.seed(4242)
# KH: Krokozhia Haavelmo Model
KH.df <- data.frame(Year=seq(1950,2013),
c=sample(300:1500,64,replace=TRUE),
y=sample(1000:5000,64,replace=TRUE),
r=sample(100:500,64,replace=TRUE),
x=sample(100:500,64,replace=TRUE))
# Add lag data for difference
KH.df$y.lag <- c(NA, KH.df$y[1:63])
KH.df$c.lag <- c(NA, KH.df$c[1:63])
KH.df$x.lag <- c(NA, KH.df$x[1:63])
Determining Constants
Multilevel regression is needed in order to fit the data and determine the constants of the dynamic model. One R package called sem developed by John Fox can do such analysis.
library(sem)
# Two-stage least squares
# Eq1: $c(t)= \alpha Dy(t) + \beta + u(t)$,
# \beta is the intercept and u(t) is not used
# Eq2: $r(t)= \mu D(c(t)+x(t)) +\nu + w(t)$,
# \nu is the intercept and w(t) is not used
KH.eq1 <- tsls(c~I(y-y.lag), ~c+y+r, data=KH.df)
KH.eq2 <- tsls(r~I(c-c.lag+x-x.lag), ~c+y+r, data=KH.df)
coef(KH.eq1) # alpha=875.414 nu=0.015
coef(KH.eq2) # mu=-0.028 nu=300.675
Here tsls performs two-stage least square analysis.
Propagating a disturbance in the economy
We have not used any disturbance in determining the system coefficients, constants, above. However, we can propagate the values of economic observables using the dynamic model if we set a disturbance value at a given time. Imagine if we set disturbance on the year 2001 as $u=200$ and $w=150$. Hence, the dynamic model will read on year 2001, $t=2001$, $u=200$ and $w=150$
$c = 875.414 (y-2570) + 0.015 + 200$
$r = -0.028(c+x-1281-479) + 300.675 + 150$,
$y = c+x-r$,
By solving this under-determined system of equations we can determine the values in the year 2001 and furthermore propagate all dynamics after 2001 similarly. The resulting new series will give us a quantitative idea of the effect of single disturbance in the simulated economy.
Conclusions and outlook
In this post, we have briefly reviewed possible uses of R in simulating dynamic econometric models, in particular simultaneous equation models. A simple demonstration of determining model coefficients of the Haavelmo type toy model with generated synthetic data is provided. One use case of this type of approach in economic scenario analysis and forecasting is to monitor propagation of the econometric instruments over time is also mentioned.References
[1] The Statistical Implications of a System of Simultaneous Equations,Trygve Haavelmo, Econometrica, Vol. 11, No. 1. (Jan., 1943), pp. 1-12
[2] Econometrics Analysis, William H. Greene, Prentice Hall (2011)
[3] Applied Econometrics with R,
Kleiber, Christian, Zeileis, Achim, Springer (2008), Achim Zeileis
[4] Methods of measuring the marginal propensity to consume,
T. Haavelmo, Journal of the American Statistical Association,
1947 - Taylor & Francis.
[5] Time series analysis and simultaneous equation econometric models,
Zellner, Arnold and Palm, Franz, Journal of Econometrics,
Vol.2, Num.1, p17-54 (1974)