Gibbs Sampler: Dynamic Linear Regression

Author
Affiliation

Elliot Shannon

Michigan State University

Setting

The setting of interest involves data generated from a dynamic linear model. Here, dynamic refers to the evolution of regression coefficients (\(\boldsymbol{\beta}\)’s) over time. Rather than standard multiple linear regression, we have different regression coefficients for each time step, where coefficients at time \(t\) are centered on the coefficient values at time \(t-1\). Moreover, the covariates (\(\textbf{X}\)), may also change over time, hence we have a different design matrix for each time step. The proposed model is described in detail below.

Model

For time points \(t = 1, \ldots, T\), let

  • \(\textbf{Y}_t = \begin{pmatrix} y_{1,t} \\ \vdots \\ y_{N,t} \end{pmatrix} \in \mathbb{R}^N\) be the vector of responses.

  • \(\textbf{X}_t = \begin{pmatrix} 1 & x_{1, 1, t} & \ldots & x_{1, P - 1, t} \\ & & \vdots \\ 1 & x_{N, 1, t} & \ldots & x_{N, P - 1, t} \end{pmatrix} \in \mathbb{R}^{N \times P}\) be the design matrix of covariates.

  • \(\boldsymbol{\beta}_t = \begin{pmatrix} \beta_{1,t} \\ \vdots \\ \beta_{P,t} \end{pmatrix} \in \mathbb{R}^P\) be the vector of regression coefficients.

  • \( \boldsymbol{\varepsilon} _t = \begin{pmatrix} \varepsilon_{1,t} \\ \vdots \\ \varepsilon_{N,t} \end{pmatrix} \in \mathbb{R}^N\) be the vector of residual terms.

The proposed model is then

\[\begin{equation} \label{eq:mod} \begin{aligned} \textbf{Y}_t &= \textbf{X}_t \boldsymbol{\beta}_t + \boldsymbol{\varepsilon} _t, \quad \boldsymbol{\varepsilon} _t \sim MVN(\textbf{0}, \sigma^2 \textbf{I}_N), \quad t = 1, \ldots, T\\ \boldsymbol{\beta}_t &= \boldsymbol{\beta}_{t - 1} + \boldsymbol{\eta} _t, \quad \boldsymbol{\beta}_0 \sim MVN( \boldsymbol{\mu} _\beta, \boldsymbol{\Sigma} _\beta), \\ \boldsymbol{\eta} _t &\sim MVN(0, \boldsymbol{\Sigma} _\eta), \quad \boldsymbol{\Sigma} _\eta \sim IW(\textbf{H}, \nu), \quad \sigma^2 \sim IG(a, b) \end{aligned} \end{equation}\]

where \(\textbf{0}\) is a vector of \(0\)’s.

Further, \(MVN( \boldsymbol{\mu} , \boldsymbol{\Sigma} )\) is the multivariate normal distribution with mean vector \( \boldsymbol{\mu} \in \mathbb{R}^N\), covariance matrix \( \boldsymbol{\Sigma} \in \mathbb{R}^{N \times N}\), and pdf

\[\begin{equation}\label{eq:mvn} p(\textbf{x}\mid \boldsymbol{\mu} , \boldsymbol{\Sigma} ) = (2 \pi)^{-N/2} | \boldsymbol{\Sigma} |^{-1/2} \exp\left(\frac{-1}{2} (\textbf{x}- \boldsymbol{\mu} )^\top \boldsymbol{\Sigma} ^{-1} (\textbf{x}- \boldsymbol{\mu} )\right) \end{equation}\]

\(IW(\textbf{H}, \nu)\) is the Inverse Wishart distribution with (positive definite) scale matrix \(\textbf{H}\in \mathbb{R}^{N \times N}\), degrees of freedom \(\nu > N - 1\), and pdf

\[\begin{equation}\label{eq:iw} p( \boldsymbol{\Sigma} \mid \textbf{H}, \nu) = \frac{|\textbf{H}|^{\nu / 2}}{2^{\nu N / 2} \Gamma_N \left( \frac{\nu}{2} \right)} | \boldsymbol{\Sigma} |^{-(\nu + N + 1) / 2} \exp\left(-\frac{1}{2} \text{tr}\left( \textbf{H} \boldsymbol{\Sigma} ^{-1} \right) \right) \end{equation}\]

and \(IG(a, b)\) is the Inverse Gamma distribution with shape parameter \(a > 0\), scale parameter \(b > 0\), and pdf

\[\begin{equation}\label{eq:ig} p(\sigma^2 \mid a, b) = \frac{b^a}{\Gamma (a)} (\sigma^2)^{-a - 1} \exp\left( \frac{-b}{\sigma^2} \right). \end{equation}\]

Given this completed model specification, we would like to conduct inference on the model parameters given data \(\textbf{Y}_t\) and \(\textbf{X}_t\), \(t = 1, \ldots, T\). These parameters include \(\boldsymbol{\beta}_0, \boldsymbol{\beta}_1, \ldots, \boldsymbol{\beta}_T, \boldsymbol{\Sigma} _\eta, \sigma^2\). We assume that prior parameters \( \boldsymbol{\mu} _\beta, \boldsymbol{\Sigma} _\beta, \textbf{H}, \nu, a, b\) are given (specified by the user).

Posterior

We know that the posterior distribution in Bayesian inference is proportional to the likelihood (data distribution) times the prior. Here, our likelihood is the distribution of the data \(\textbf{Y}_t, t = 1, \ldots, T\). Since the distribution of the residuals is \(MVN\), we will have a \(MVN\) likelihood, written as

\[\begin{equation}\label{eq:likelihood} \prod_{t = 1}^T MVN(\textbf{Y}_t \mid \textbf{X}_t \boldsymbol{\beta}_t, \sigma^2 \textbf{I}_N) \end{equation}\]

where \(\textbf{I}_N\) is an \(N \times N\) identity matrix (diagonal matrix with diagonal elements equal to \(1\)).

We will also have priors (some of which are induced) for each parameter for which we seek inference. Specifically, we have prior distributions of the form

\[\begin{equation}\label{eq:priors} \prod_{t = 1}^T MVN(\boldsymbol{\beta}_t \mid \boldsymbol{\beta}_{t - 1}, \boldsymbol{\Sigma} _\eta) \times MVN(\boldsymbol{\beta}_0 \mid \boldsymbol{\mu} _\beta, \boldsymbol{\Sigma} _\beta) \times IG(\sigma^2 \mid a, b) \times IW( \boldsymbol{\Sigma} _\eta \mid \textbf{H}, \nu) \end{equation}\]

Given (\(\ref{eq:likelihood}\)) and (\(\ref{eq:priors}\)), their product will be proportional to the posterior distribution, and is given as

\[\begin{equation}\label{eq:posterior} \begin{aligned} \prod_{t = 1}^T MVN(\textbf{Y}_t \mid \textbf{X}_t \boldsymbol{\beta}_t, \sigma^2 \textbf{I}_N) &\times \prod_{t = 1}^T MVN(\boldsymbol{\beta}_t \mid \boldsymbol{\beta}_{t - 1}, \boldsymbol{\Sigma} _\eta) \times MVN(\boldsymbol{\beta}_0 \mid \boldsymbol{\mu} _\beta, \boldsymbol{\Sigma} _\beta) \\ &\times IG(\sigma^2 \mid a, b) \times IW( \boldsymbol{\Sigma} _\eta \mid \textbf{H}, \nu) \end{aligned} \end{equation}\]

Importantly, (\(\ref{eq:posterior}\)) is only to our desired posterior. Why is this okay? We are essentially ignoring the complex integral (normalizing constant) which typically appears in the denominator of Bayes theorem, which represents the marginal distribution of our complete data \(\textbf{Y}\). This normalizing constant is necessary to ensure that the posterior we derive is actually a proper probability distribution (integrates to 1). It is important, but not necessary for our purposes of deriving full conditionals for Gibbs sampling. Moving forward, we only work with this proportional posterior (\(\ref{eq:posterior}\)).

Conditional Intuition

Recall that in the current model (\(\ref{eq:mod}\)), we are interested in learning about parameters \(\boldsymbol{\beta}_0, \boldsymbol{\beta}_1, \ldots, \boldsymbol{\beta}_T, \boldsymbol{\Sigma} _\eta, \sigma^2\). These parameters each appear in our posterior distribution given in (\(\ref{eq:posterior}\)); some appear in several terms, others appear in fewer. The beauty of Gibbs sampling is that for each parameter of interest, we can derive its , meaning we don’t need to work with the entirety of (\(\ref{eq:posterior}\)), but rather just the specific terms in which the parameter of interest is found. This simplifies our lives greatly. Moreover, if we have been clever enough to use conjugate priors for each parameter, we can expect that our full conditional distributions will take nice forms. So nice in fact, that we can skip many tedious steps in our derivation to simply identify the defining features (parameters) of each distribution. The general steps are outlined below.

General steps for deriving full conditional distributions

  1. Identify each term in the posterior distribution in which the parameter of interest is found.
  2. Isolate these terms and write them as a product of pdfs.
  3. Combine like terms and ignore terms that do not involve the parameter of interest.
  4. Recognize the simplified expression as a pdf of the parameter of interest and identify the distributional parameters. This is the full conditional distribution for the parameter of interest.
  5. Move to the next parameter of interest and go back to step 1.

A note on conjugacy

In Bayesian inference, conjugacy refers to the pairing of prior and likelihood distributions to achieve a specific posterior form. When a prior is conjugate, the resulting posterior distribution will have the same distributional form. This is immensely helpful when deriving full conditional distributions, especially in step 4 of the general steps given above. Some important examples of conjugacy are given below:

  • Normal Likelihood \(\times\) Normal Prior \(\rightarrow\) Normal Posterior
  • Normal Likelihood \(\times\) IG Prior \(\rightarrow\) IG Posterior
  • Normal Likelihood \(\times\) IW Prior \(\rightarrow\) IW Posterior

Incompleting the Square

In the case of Normal-Normal conjugacy (normal prior with normal likelihood), we expect the posterior distribution to also follow a normal distribution. The same is true with multivariate normal distributions, which we work with here. A handy trick for identifying this posterior distribution is coined “”, which comes from James Clark’s book ‘Models for Ecological Data’. See Section 4.2 of his book for more details.

Following (\(\ref{eq:mvn}\)), we know that the kernel of the \(MVN\) distribution is of the form

\[\begin{equation*} p(\textbf{x}\mid \boldsymbol{\mu} , \boldsymbol{\Sigma} ) \propto \exp\left(\frac{-1}{2} (\textbf{x}- \boldsymbol{\mu} )^\top \boldsymbol{\Sigma} ^{-1} (\textbf{x}- \boldsymbol{\mu} )\right) \end{equation*}\]

where \(\propto\) mean “proportional to”. This structure gives us some insight into the placement of the parameters in the pdf. To see this further, we can focus on the terms inside the exponential, ignoring the \(-\frac{1}{2}\) coefficient. Expanding this term gives us:

\[\begin{align}\label{eq:expand_mvn} (\textbf{x}- \boldsymbol{\mu} )^\top \boldsymbol{\Sigma} ^{-1} (\textbf{x}- \boldsymbol{\mu} ) &= \textbf{x}^\top \boldsymbol{\Sigma} ^{-1} \textbf{x}- \textbf{x}^\top \boldsymbol{\Sigma} ^{-1} \boldsymbol{\mu} - \boldsymbol{\mu} ^\top \boldsymbol{\Sigma} ^{-1} \textbf{x}+ \boldsymbol{\mu} ^\top \boldsymbol{\Sigma} ^{-1} \boldsymbol{\mu} \end{align}\]

Now comes the trick. Writing the mean as a function of the covariance matrix, we can define

\[\begin{align*} \boldsymbol{\mu} &= \textbf{V}\textbf{v}\\ \boldsymbol{\Sigma} &= \textbf{V} \end{align*}\]

This allows us to rewrite (\(\ref{eq:expand_mvn}\)) as

\[\begin{align*} \textbf{x}^\top \textbf{V}^{-1} \textbf{x}&- \textbf{x}^\top \textbf{V}^{-1} \textbf{V}\textbf{v}- \textbf{v}^\top \textbf{V}^\top \textbf{V}^{-1} \textbf{x}+ \textbf{v}^\top \textbf{V}^\top \textbf{V}^{-1} \textbf{V}\textbf{v}\\ &= \textbf{x}^\top \textbf{V}^{-1} \textbf{x}- \textbf{x}^\top \textbf{v}- \textbf{v}^\top \textbf{x}+ \textbf{v}^\top \textbf{V}\textbf{v}\\ &= \textbf{x}^\top \textbf{V}^{-1} \textbf{x}- \textbf{x}^\top \textbf{v}\cdots \end{align*}\]

This is super helpful, because when we are mashing together normal priors and likelihoods, we can combine like terms and look for \(\textbf{V}^{-1}\) and \(\textbf{v}\) to get the mean and variance of the full conditional \(MVN\). They will always be in the same place! \(\textbf{V}^{-1}\) will be sandwiched between the variable of interest (i.e. the \(\textbf{x}^\top \textbf{V}^{-1} \textbf{x}\) term), and \(\textbf{v}\) will be attached to the negative transpose of the variable of interest (i.e. \(- \textbf{x}^\top \textbf{v}\)) (in this example, \(\textbf{x}\) is the variable of interest). Once we identify \(\textbf{V}^{-1}\) and \(\textbf{v}\), we can sample from the full conditional distribution as \(MVN(\textbf{V}\textbf{v}, \textbf{V})\).

Working it out

We will now go through each of the parameters in model (\(\ref{eq:mod}\)) and derive their full conditional distributions. Moving forward, we will use different colors to help keep track of different prior and likelihood terms as we work through the derivations. This will help us see where each component of the full conditional distribution is coming from.

Update \(\boldsymbol{\beta}_0\)

Looking at (\(\ref{eq:posterior}\)), we see that \(\boldsymbol{\beta}_0\) appears in two terms, namely it’s , and as the mean value of . The product of these distributions looks like

\[\begin{equation*} \textcolor{OrangeRed}{MVN(\boldsymbol{\beta}_0 \mid \boldsymbol{\mu} _\beta, \boldsymbol{\Sigma} _\beta)} \times \textcolor{RoyalBlue}{MVN(\boldsymbol{\beta}_1 \mid \boldsymbol{\beta}_0, \boldsymbol{\Sigma} _\eta)} \end{equation*}\]

Since both distributions are normal, we know their product will also be normal, with mean \(\textbf{V}\textbf{v}\) and Covariance matrix \(\textbf{V}\). Based on what we know about incompleting the square, we will find \(\textbf{V}\) and \(\textbf{v}\) in the following terms of the product

\[\begin{equation*} \boldsymbol{\beta}_0^\top \textbf{V}^{-1} \boldsymbol{\beta}_0 - \boldsymbol{\beta}_0^\top \textbf{v}\cdots. \end{equation*}\]

So, all we have to do is multiply two multivariate normal pdfs (see equation \(\ref{eq:mvn}\)), combine like terms in \(\boldsymbol{\beta}_0\), and identify \(\textbf{V}^{-1}\) and \(\textbf{v}\) from where they appear above. Since we are incompleting the square, we don’t actually need to multiply the complete pdfs, just the exponential parts where \(\textbf{V}\) and \(\textbf{v}\) will appear. Ignoring the other components of the pdfs, we have

\[\begin{align*} &\textcolor{OrangeRed}{MVN(\boldsymbol{\beta}_0 \mid \boldsymbol{\mu} _\beta, \boldsymbol{\Sigma} _\beta)} \times \textcolor{RoyalBlue}{MVN(\boldsymbol{\beta}_1 \mid \boldsymbol{\beta}_0, \boldsymbol{\Sigma} _\eta)} \\ &\propto \textcolor{OrangeRed}{\exp \left((\boldsymbol{\beta}_0 - \boldsymbol{\mu} _\beta)^\top \boldsymbol{\Sigma} _\beta^{-1}(\boldsymbol{\beta}_0 - \boldsymbol{\mu} _\beta)\right)} \times \textcolor{RoyalBlue}{\exp \left((\boldsymbol{\beta}_1 - \boldsymbol{\beta}_0)^\top \boldsymbol{\Sigma} _\eta^{-1}(\boldsymbol{\beta}_1 - \boldsymbol{\beta}_0)\right)} \\ &= \exp \left(\textcolor{OrangeRed}{(\boldsymbol{\beta}_0 - \boldsymbol{\mu} _\beta)^\top \boldsymbol{\Sigma} _\beta^{-1}(\boldsymbol{\beta}_0 - \boldsymbol{\mu} _\beta)} + \textcolor{RoyalBlue}{(\boldsymbol{\beta}_1 - \boldsymbol{\beta}_0)^\top \boldsymbol{\Sigma} _\eta^{-1}(\boldsymbol{\beta}_1 - \boldsymbol{\beta}_0)} \right) \\ &= \exp \left( \textcolor{OrangeRed}{\boldsymbol{\beta}_0^\top \boldsymbol{\Sigma} _\beta^{-1} \boldsymbol{\beta}_0 - \boldsymbol{\beta}_0^\top \boldsymbol{\Sigma} _\beta^{-1} \boldsymbol{\mu} _\beta - \boldsymbol{\mu} _\beta^\top \boldsymbol{\Sigma} _\beta^{-1} \boldsymbol{\beta}_0 + \boldsymbol{\mu} _\beta^\top \boldsymbol{\Sigma} _\beta^{-1} \boldsymbol{\mu} _\beta} \right. \\ &\quad \left. + \textcolor{RoyalBlue}{\boldsymbol{\beta}_1^\top \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_1 - \boldsymbol{\beta}_1^\top \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_0 - \boldsymbol{\beta}_0^\top \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_1 + \boldsymbol{\beta}_0^\top \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_0} \right) \end{align*}\]

Now we just need to pay attention to the terms of the form \(\boldsymbol{\beta}_0^\top \textbf{V}^{-1} \boldsymbol{\beta}_0\) and \(- \boldsymbol{\beta}_0^\top \textbf{v}\) to identify \(\textbf{V}^{-1}\) and \(\textbf{v}\). Combining like terms gives us

\[\begin{equation*} \propto \exp \left( \boldsymbol{\beta}_0^\top (\textcolor{OrangeRed}{ \boldsymbol{\Sigma} _\beta^{-1}} + \textcolor{RoyalBlue}{ \boldsymbol{\Sigma} _\eta^{-1}}) \boldsymbol{\beta}_0^\top - \boldsymbol{\beta}_0^\top (\textcolor{OrangeRed}{ \boldsymbol{\Sigma} _\beta^{-1} \boldsymbol{\mu} _\beta} + \textcolor{RoyalBlue}{ \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_1}) \cdots \right) \end{equation*}\]

So we have that

\[\begin{align*} &\textbf{V}^{-1} = (\textcolor{OrangeRed}{ \boldsymbol{\Sigma} _\beta^{-1}} + \textcolor{RoyalBlue}{ \boldsymbol{\Sigma} _\eta^{-1}}) \\ &\textbf{v}= (\textcolor{OrangeRed}{ \boldsymbol{\Sigma} _\beta^{-1} \boldsymbol{\mu} _\beta} + \textcolor{RoyalBlue}{ \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_1}) \end{align*}\]

and we can sample \(\boldsymbol{\beta}_0\) from it’s full conditional distribution as \(MVN(\textbf{V}\textbf{v}, \textbf{V})\).

Update \(\boldsymbol{\beta}_t \quad t = 1, \ldots (T-1)\)

Looking at (\(\ref{eq:posterior}\)), we see that \(\boldsymbol{\beta}_t\) appears in 3 terms, namely it’s , the , and in the . The product of these distributions looks like

\[\begin{equation*} \textcolor{Emerald}{MVN(\textbf{Y}_t \mid \textbf{X}_t \boldsymbol{\beta}_t, \sigma^2 \textbf{I}_N)} \times \textcolor{OrangeRed}{MVN(\boldsymbol{\beta}_t \mid \boldsymbol{\beta}_{t - 1}, \boldsymbol{\Sigma} _\eta)} \times \textcolor{RoyalBlue}{MVN(\boldsymbol{\beta}_{t + 1} \mid \boldsymbol{\beta}_t, \boldsymbol{\Sigma} _\eta)} \end{equation*}\]

Again, we will expand this product, only paying attention to the terms in the exponent so we can incomplete the square. We have

\[\begin{align*} &\textcolor{Emerald}{MVN(\textbf{Y}_t \mid \textbf{X}_t \boldsymbol{\beta}_t, \sigma^2 \textbf{I}_N)} \times \textcolor{OrangeRed}{MVN(\boldsymbol{\beta}_t \mid \boldsymbol{\beta}_{t - 1}, \boldsymbol{\Sigma} _\eta)} \times \textcolor{RoyalBlue}{MVN(\boldsymbol{\beta}_{t + 1} \mid \boldsymbol{\beta}_t, \boldsymbol{\Sigma} _\eta)} \\ &\propto \textcolor{Emerald}{\exp \left((\textbf{Y}_t - \textbf{X}_t \boldsymbol{\beta}_t)^\top (\sigma^2 \textbf{I}_N)^{-1}(\textbf{Y}_t - \textbf{X}_t \boldsymbol{\beta}_t)\right)} \times \textcolor{OrangeRed}{\exp \left((\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})^\top \boldsymbol{\Sigma} _\eta^{-1}(\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})\right)} \\ &\quad \times \textcolor{RoyalBlue}{\exp \left((\boldsymbol{\beta}_{t+1} - \boldsymbol{\beta}_t)^\top \boldsymbol{\Sigma} _\eta^{-1}(\boldsymbol{\beta}_{t+1} - \boldsymbol{\beta}_t)\right)} \\ &= \exp \left(\textcolor{Emerald}{(\textbf{Y}_t - \textbf{X}_t \boldsymbol{\beta}_t)^\top (\sigma^2 \textbf{I}_N)^{-1}(\textbf{Y}_t - \textbf{X}_t \boldsymbol{\beta}_t)} + \textcolor{OrangeRed}{(\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})^\top \boldsymbol{\Sigma} _\eta^{-1}(\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})} \right. \\ &\quad \left. + \textcolor{RoyalBlue}{(\boldsymbol{\beta}_{t+1} - \boldsymbol{\beta}_t)^\top \boldsymbol{\Sigma} _\eta^{-1}(\boldsymbol{\beta}_{t+1} - \boldsymbol{\beta}_t)} \right) \\ &= \exp \left( \textcolor{Emerald}{\textbf{Y}_t^\top (\sigma^2 \textbf{I}_N)^{-1} \textbf{Y}_t - \textbf{Y}_t^\top (\sigma^2 \textbf{I}_N)^{-1} \textbf{X}_t \boldsymbol{\beta}_t - \boldsymbol{\beta}_t^\top \textbf{X}_t^\top (\sigma^2 \textbf{I}_N)^{-1} \textbf{Y}_t + \boldsymbol{\beta}_t^\top \textbf{X}_t^\top (\sigma^2 \textbf{I}_N)^{-1} \textbf{X}_t \boldsymbol{\beta}_t} \right. \\ &\quad \left. + \textcolor{OrangeRed}{\boldsymbol{\beta}_t^\top \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_t - \boldsymbol{\beta}_t^\top \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_{t-1} - \boldsymbol{\beta}_{t-1}^\top \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_t + \boldsymbol{\beta}_{t-1}^\top \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_{t-1}} \right. \\ &\quad \left. + \textcolor{RoyalBlue}{\boldsymbol{\beta}_{t+1}^\top \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_{t+1} - \boldsymbol{\beta}_{t+1}^\top \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_t - \boldsymbol{\beta}_t^\top \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_{t+1} + \boldsymbol{\beta}_t^\top \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_t} \right) \end{align*}\]

Now we just need to pay attention to the terms of the form \(\boldsymbol{\beta}_t^\top \textbf{V}^{-1} \boldsymbol{\beta}_t\) and \(- \boldsymbol{\beta}_t^\top \textbf{v}\) to identify \(\textbf{V}^{-1}\) and \(\textbf{v}\). Combining like terms gives us

\[\begin{equation*} \propto \exp \left( \boldsymbol{\beta}_t^\top (\textcolor{Emerald}{\textbf{X}_t^\top (\sigma^2 \textbf{I}_N)^{-1} \textbf{X}_t} + \textcolor{OrangeRed}{ \boldsymbol{\Sigma} _\eta^{-1}} + \textcolor{RoyalBlue}{ \boldsymbol{\Sigma} _\eta^{-1}}) \boldsymbol{\beta}_t^\top - \boldsymbol{\beta}_0^\top (\textcolor{Emerald}{\textbf{X}_t^\top (\sigma^2 \textbf{I}_N)^{-1} \textbf{Y}_t} + \textcolor{OrangeRed}{ \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_{t-1}} + \textcolor{RoyalBlue}{ \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_{t+1}}) \cdots \right) \end{equation*}\]

So we have that

\[\begin{align*} \begin{gathered} \textbf{V}^{-1} = (\textcolor{Emerald}{\textbf{X}_t^\top (\sigma^2 \textbf{I}_N)^{-1} \textbf{X}_t} + \textcolor{OrangeRed}{ \boldsymbol{\Sigma} _\eta^{-1}} + \textcolor{RoyalBlue}{ \boldsymbol{\Sigma} _\eta^{-1}}) \\ \textbf{v}= (\textcolor{Emerald}{\textbf{X}_t^\top (\sigma^2 \textbf{I}_N)^{-1} \textbf{Y}_t} + \textcolor{OrangeRed}{ \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_{t-1}} + \textcolor{RoyalBlue}{ \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_{t+1}}) \end{gathered} \end{align*}\]

and we can sample \(\boldsymbol{\beta}_t\) from it’s full conditional distribution as \(MVN(\textbf{V}\textbf{v}, \textbf{V})\).

Update \(\boldsymbol{\beta}_T\)

To update \(\boldsymbol{\beta}_T\), we follow the same steps as for \(\boldsymbol{\beta}_t\), \(t = 1, \ldots, (T-1)\), except the terms are no longer used, since there is no \(T + 1\) time point. We will then have

\[\begin{align*} \begin{gathered} \textbf{V}^{-1} = (\textcolor{Emerald}{\textbf{X}_T^\top (\sigma^2 \textbf{I}_N)^{-1} \textbf{X}_T} + \textcolor{OrangeRed}{ \boldsymbol{\Sigma} _\eta^{-1}}) \\ \textbf{v}= (\textcolor{Emerald}{\textbf{X}_T^\top (\sigma^2 \textbf{I}_N)^{-1} \textbf{Y}_T} + \textcolor{OrangeRed}{ \boldsymbol{\Sigma} _\eta^{-1} \boldsymbol{\beta}_{T-1}}) \end{gathered} \end{align*}\]

and we can sample \(\boldsymbol{\beta}_T\) from it’s full conditional distribution as \(MVN(\textbf{V}\textbf{v}, \textbf{V})\).

Update \(\sigma^2\)

Up until now, we have used normal likelihoods and priors to determine the normally distributed full conditional distributions for parameters of interest via completing the square. To update \(\sigma^2\), we will have to use a different form of conjugacy; the \(IG\)-\(MVN\) conjugacy. Here, the \(IG\) prior on \(\sigma^2\) times the \(MVN\) likelihood for \(\textbf{Y}\) yields another \(IG\) full conditional distribution for \(\sigma^2\). It is our job then to determine what these new \(IG\) parameters will be.

Looking at (\(\ref{eq:posterior}\)), we see that \(\sigma^2\) appears in \(T+1\) many terms, namely it’s , and the . The product of these distributions looks like

\[\begin{equation*} \textcolor{Emerald}{ \prod_{t=1}^T MVN(\textbf{Y}_t \mid \textbf{X}_t \boldsymbol{\beta}_t, \sigma^2 \textbf{I}_N)} \times \textcolor{DeepPink}{IG(\sigma^2 \mid a, b)} \end{equation*}\]

From our knowledge of conjugate distributions, we know that the product of the \(IG\) distribution with \(MVN\) distributions should also be \(IG\). Recall the \(IG\) pdf defined in (\(\ref{eq:ig}\)). We will multiply these pdfs together, rearrange terms, and hopefully recover another \(IG\) distribution with different parameter values. We have the following

\[\begin{align*} &\textcolor{Emerald}{ \prod_{t=1}^T MVN(\textbf{Y}_t \mid \textbf{X}_t \boldsymbol{\beta}_t, \sigma^2 \textbf{I}_N)} \times \textcolor{DeepPink}{IG(\sigma^2 \mid a, b)} \\ &= \textcolor{Emerald}{\prod_{t=1}^T (2\pi)^{-N/2} |\sigma^2 \textbf{I}_N|^{-1/2} \exp \left( \frac{-1}{2\sigma^2} (\textbf{Y}_t - \textbf{X}_t \boldsymbol{\beta}_t)^\top (\textbf{Y}_t - \textbf{X}_t \boldsymbol{\beta}_t) \right)} \times \textcolor{DeepPink}{\frac{b^a}{\Gamma (a)} (\sigma^2)^{-a - 1} \exp \left( \frac{-b}{\sigma^2} \right)} \\ &\propto (\sigma^2)^{\textcolor{Emerald}{\frac{-NT}{2}} \textcolor{DeepPink}{-a -1}} \exp \left( - \frac{\textcolor{Emerald}{\frac{1}{2} \sum_{t = 1}^T (\textbf{Y}_t - \textbf{X}_t \boldsymbol{\beta}_t)^\top (\textbf{Y}_t - \textbf{X}_t \boldsymbol{\beta}_t)} \textcolor{DeepPink}{+ b}}{\sigma^2} \right) \end{align*}\]

Here in the second step, we write the statement as \(\propto\) because we are only interested in the components of the product that will become part of the \(IG\) pdf. We also combined like terms, and we can now recognize the expression as an \(IG\) distribution for \(\sigma^2\), with the following parameters

\[\begin{align*} \tilde{a} &= \textcolor{DeepPink}{a} + \textcolor{Emerald}{\frac{NT}{2}} \\ \tilde{b} &= \textcolor{DeepPink}{b} + \textcolor{Emerald}{\frac{1}{2} \sum_{t = 1}^T (\textbf{Y}_t - \textbf{X}_t \boldsymbol{\beta}_t)^\top (\textbf{Y}_t - \textbf{X}_t \boldsymbol{\beta}_t)} \end{align*}\]

and we can sample from \(\sigma^2\)’s full conditional distribution as \(IG(\tilde{a}, \tilde{b})\).

Update \( \boldsymbol{\Sigma} _\eta\)

In this final step, we derive the full conditional distribution for \( \boldsymbol{\Sigma} _\eta\). One important thing to realize here is that \( \boldsymbol{\Sigma} _\eta\) is a , not a scalar or vector as we have worked with before. A conjugate prior for a matrix-valued parameter to the multivariate normal is the Inverse-Wishart \(IW\) prior (\(\ref{eq:iw}\)). Here we use this prior, which will yield a full conditional distribution for \( \boldsymbol{\Sigma} _\eta\) that is also distributed as \(IW\).

Looking at (\(\ref{eq:posterior}\)), we have \( \boldsymbol{\Sigma} _\eta\) appearing in \(T+1\) many terms, namely it’s , and the . The product of these distributions looks like

\[\begin{equation*} \textcolor{OrangeRed}{ \prod_{t=1}^T MVN(\boldsymbol{\beta}_t \mid \boldsymbol{\beta}_{t-1}, \boldsymbol{\Sigma} _\eta)} \times \textcolor{RoyalPurple}{IW( \boldsymbol{\Sigma} _\eta \mid \textbf{H}, \nu)} \end{equation*}\]

Our goal now is to simplify the product of these pdfs, and get it in the form of a new \(IW\) distribution where we can identify new parameters for the full conditional distribution. The product of these pdfs will be

\[\begin{align*} &\textcolor{OrangeRed}{ \prod_{t=1}^T MVN(\boldsymbol{\beta}_t \mid \boldsymbol{\beta}_{t-1}, \boldsymbol{\Sigma} _\eta)} \times \textcolor{RoyalPurple}{IW( \boldsymbol{\Sigma} _\eta \mid \textbf{H}, \nu)} \\ &= \textcolor{OrangeRed}{\prod_{t=1}^T (2\pi)^{-P/2} | \boldsymbol{\Sigma} _\eta|^{-1/2} \exp \left( \frac{-1}{2} (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})^\top \boldsymbol{\Sigma} _\eta^{-1} (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1}) \right)} \\ &\quad \times \textcolor{RoyalPurple}{\frac{|\textbf{H}|^{\nu / 2}}{2^{\nu N / 2} \Gamma_N \left( \frac{\nu}{2} \right)} | \boldsymbol{\Sigma} _\eta|^{-(\nu + N + 1) / 2} \exp\left(-\frac{1}{2} \text{tr}\left( \textbf{H} \boldsymbol{\Sigma} _\eta^{-1} \right) \right)} \\ &\propto | \boldsymbol{\Sigma} _\eta|^{-(\textcolor{OrangeRed}{T} + \textcolor{RoyalPurple}{\nu+P+1})/2} \exp \left( \frac{-1}{2} \left[ \underbrace{\textcolor{OrangeRed}{\sum_{t = 1}^T (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})^\top \boldsymbol{\Sigma} _\eta^{-1} (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})}}_A + \textcolor{RoyalPurple}{\text{tr} ( \textbf{H} \boldsymbol{\Sigma} _\eta^{-1} )} \right] \right) \end{align*}\]

We will now use a trick to massage the expression into the form of the \(IW\) distribution. Since \(A\) is evaluated as a scalar (sum of quadratic forms), we can take the of it without changing the expression (the trace of a scalar is the scalar itself). So we can write

\[\begin{align*} \quad A &= \textcolor{OrangeRed}{\sum_{t = 1}^T (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})^\top \boldsymbol{\Sigma} _\eta^{-1} (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})} \\ \quad &= \text{tr} \left( \textcolor{OrangeRed}{\sum_{t = 1}^T (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})^\top \boldsymbol{\Sigma} _\eta^{-1} (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})} \right) \end{align*}\]

We also know that the trace of a sum is equal to the sum of a trace, so we can move the trace operator inside the sum.

\[\begin{align*} \quad &= \textcolor{OrangeRed}{\sum_{t = 1}^T} \text{tr} \left( \textcolor{OrangeRed}{(\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})^\top \boldsymbol{\Sigma} _\eta^{-1} (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})} \right) \end{align*}\]

Another useful identity of the trace operator is the following

\[\begin{equation*} \text{tr} (\textbf{A}\textbf{B}\textbf{C}) = \text{tr} (\textbf{B}\textbf{C}\textbf{A}) = \text{tr} (\textbf{C}\textbf{A}\textbf{B}) \end{equation*}\]

Here, we can change the order of the product of matrices in a trace by moving the first matrix to the end of the product. Applying that above yields

\[\begin{align*} \quad &= \textcolor{OrangeRed}{\sum_{t = 1}^T} \text{tr} \left( \textcolor{OrangeRed}{(\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1}) (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})^\top \boldsymbol{\Sigma} _\eta^{-1}} \right) \end{align*}\]

This is the form we want for \(A\), and plugging it back in gives

\[\begin{align*} &| \boldsymbol{\Sigma} _\eta|^{-(\textcolor{OrangeRed}{T} + \textcolor{RoyalPurple}{\nu+P+1})/2} \exp \left( \frac{-1}{2} \left[ \textcolor{OrangeRed}{\sum_{t = 1}^T} \text{tr} \left( \textcolor{OrangeRed}{(\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1}) (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})^\top \boldsymbol{\Sigma} _\eta^{-1}} \right) + \textcolor{RoyalPurple}{\text{tr} ( \textbf{H} \boldsymbol{\Sigma} _\eta^{-1} )} \right] \right) \\ = &| \boldsymbol{\Sigma} _\eta|^{-(\textcolor{OrangeRed}{T} + \textcolor{RoyalPurple}{\nu+P+1})/2} \exp \left( \frac{-1}{2} \text{tr} \left[ \left( \textcolor{OrangeRed}{\sum_{t = 1}^T (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1}) (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})^\top} + \textcolor{RoyalPurple}{\textbf{H}} \right) \boldsymbol{\Sigma} _\eta^{-1} \right] \right) \end{align*}\]

which is in the same form as (\(\ref{eq:iw}\)), with parameter values

\[\begin{align*} \tilde{\nu} &= \textcolor{RoyalPurple}{\nu} + \textcolor{OrangeRed}{T} \\ \tilde{\textbf{H}} &= \textcolor{RoyalPurple}{\textbf{H}} + \textcolor{OrangeRed}{\sum_{t = 1}^T (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1}) (\boldsymbol{\beta}_t - \boldsymbol{\beta}_{t-1})^\top} \end{align*}\]

and we can update \( \boldsymbol{\Sigma} _\eta\) from its full conditional distribution as \(IW(\tilde{\textbf{H}}, \tilde{\nu})\).