posterior#

Simulating Posterior Distribution for Model Variables#

Purpose#

The sample command with method equal to simulate fits simulated random measurements data_sim_table and simulated random priors prior_sim_table . The Lemmas on this page prove that, in the linear Gaussian case, this gives the proper statistics for the posterior distribution of the maximum likelihood estimate. Note that the dismod_at case is not linear nor are all the statistics Gaussian.

Lemma 1#

Suppose we are given a matrix \(A \in \B{R}^{m \times n}\), a positive definite matrix, \(V \in \B{R}^{m \times m}\) and a \(y \in \B{R}^{m \times 1}\). Further suppose that there is an unknown vector \(\bar{\theta} \in \B{R}^{n \times 1}\) such that \(y \sim \B{N} ( A \bar{\theta} , V )\). The maximum likelihood estimator \(\hat{\theta}\) for \(\bar{\theta}\) given \(y\) has mean \(\B{E} [ \hat{\theta} ] = \bar{\theta}\) and variance

\[\B{E} [ ( \hat{\theta} - \bar{\theta} ) ( \hat{\theta} - \bar{\theta} )^\R{T} ] = ( A^\R{T} V^{-1} A )^{-1} = - \; \left( \frac{ \partial^2 } { \partial \theta^2 } \log \B{p} ( y | \theta ) \right)^{-1}\]

Proof#

The probability density for \(y\) given \(\theta\) is

\[\B{p} ( y | \theta ) = \det ( 2 \pi V )^{-1/2} \exp \left[ - \frac{1}{2} ( y - A \theta )^\R{T} V^{-1} ( y - A \theta ) \right]\]

Dropping the determinant term, because it does not depend on \(\theta\), and taking the negative log we get the objective

\[f ( \theta ) = \frac{1}{2} ( y - A \theta )^\R{T} V^{-1} ( y - A \theta )\]

and the equivalent problem to minimize \(f ( \theta )\) with respect to \(\theta \in \B{R}^{n \times 1}\). The derivative \(f^{(1)} ( \theta ) \in \B{R}^{1 \times n}\) is given by

\[f^{(1)} ( \theta ) = - ( y - A \theta )^\R{T} V^{-1} A\]

It follows that

\[- \frac{ \partial^2 } { \partial \theta^2 } \log \B{p} ( y | \theta ) = f^{(2)} ( \theta ) = A^\R{T} V^{-1} A\]

This completes the proof of the equation for the second partial of \(\B{p} ( y | \theta )\) in the statement of the lemma.

The maximum likelihood estimate \(\hat{\theta}\) satisfies the equation \(f^{(1)} ( \hat{\theta} ) = 0\); i.e.,

\begin{eqnarray} 0 & = & - ( y - A \hat{\theta} )^\R{T} V^{-1} A \\ A^\R{T} V^{-1} y & = & A^\R{T} V^{-1} A \hat{\theta} \\ \hat{\theta} & = & ( A^\R{T} V^{-1} A )^{-1} A^\R{T} V^{-1} y \end{eqnarray}

Defining \(e = y - A \bar{\theta}\), we have \(\B{E} [ e ] = 0\) and

\begin{eqnarray} \hat{\theta} & = & ( A^\R{T} V^{-1} A )^{-1} A^\R{T} V^{-1} ( A \bar{\theta} + e ) \\ \hat{\theta} & = & \bar{\theta} + ( A^\R{T} V^{-1} A )^{-1} A^\R{T} V^{-1} e \end{eqnarray}

This expresses the estimate \(\hat{\theta}\) as a deterministic function of the noise \(e\). It follows from the last equation for \(\hat{\theta}\) above, and the fact that \(\B{E} [ e ] = 0\), that \(\B{E} [ \hat{\theta} ] = \bar{\theta}\). This completes the proof of the equation for the expected value of \(\hat{\theta}\) in the statement of the lemma.

It also follows, from the equation for \(\hat{\theta}\) above, that

\begin{eqnarray} ( \hat{\theta} - \bar{\theta} ) ( \hat{\theta} - \bar{\theta} )^\R{T} & = & ( A^\R{T} V^{-1} A )^{-1} A^\R{T} V^{-1} e e^\R{T} V^{-1} A ( A^\R{T} V^{-1} A )^{-1} \\ \B{E} [ ( \hat{\theta} - \bar{\theta} ) ( \hat{\theta} - \bar{\theta} )^\R{T} ] & = & ( A^\R{T} V^{-1} A )^{-1} \end{eqnarray}

This completes the proof of the equation for the covariance of \(\hat{\theta} - \bar{\theta}\) in the statement of the lemma.

Remark#

For the case in Lemma 1 , the second partial of \(\log \B{p} ( y | \theta )\) with respect to \(\theta\) does not depend on \(\theta\) and \(A V^{-1} A\) is the information matrix.

Lemma 2#

Suppose that in addition to all the information in Lemma 1 we have a matrix \(B \in \B{R}^{p \times n}\), a positive definite matrix, \(P \in \B{R}^{p \times p}\), and \(z \in \B{R}^{p \times 1}\) where we have independent prior information \(B \theta \sim \B{N}( z , P )\). Further suppose \(B\) has rank \(n\). For this case we define \(\hat{\theta}\) as the maximizer of \(\B{p}( y | \theta ) \B{p}( \theta )\). It follows that

\[\B{E} [ ( \hat{\theta} - \B{E}[ \hat{\theta} ] ) ( \hat{\theta} - \B{E}[ \hat{\theta} ] )^\R{T} \prec ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} = - \; \left( \frac{ \partial^2 } { \partial \theta^2 } \log [ \B{p} ( y | \theta ) \B{p} ( \theta ) ] \right)^{-1}\]

where \(\prec\) means less than in a positive definite matrix sense.

Remark#

The posterior distribution for the maximum likelihood estimate, when including a prior, cannot be sampled by fitting simulated data alone. To see this, consider the case where column one of the matrix \(A\) is zero. In this case, that data \(y\) does not depend on \(\theta_1\) and \(\hat{\theta}_1 = \bar{\theta}_1\) no matter what the value of \(y\). On the other hand, the posterior distribution for \(\theta_1\), for this case, is the same as its prior distribution and has uncertainty.

Proof#

\[\B{p} ( y | \theta ) \B{p} ( \theta ) = \det ( 2 \pi V )^{-1/2} \det ( 2 \pi P )^{-1/2} \exp \left[ - \frac{1}{2} ( y - A \theta )^\R{T} V^{-1} ( y - A \theta ) - \frac{1}{2} ( B \theta - z )^\R{T} P^{-1} ( B \theta - z ) \right]\]

The derivative of the corresponding negative log likelihood is

\[- ( y - A \theta )^\R{T} V^{-1} A + ( B \theta - z )^\R{T} P^{-1} B\]

From this point, the proof of the equation for the second partial is very similar to Lemma 1 and left to the reader.

Setting the derivative to zero, we get the corresponding maximum likelihood estimate \(\hat{\theta}\) satisfies

\begin{eqnarray} ( y - A \hat{\theta} )^\R{T} V^{-1} A & = & ( B \hat{\theta} - z )^\R{T} P^{-1} B \\ y^\R{T} V^{-1} A + z^\R{T} P^{-1} B & = & \hat{\theta}^\R{T} A^\R{T} V^{-1} A + \hat{\theta}^\R{T} B^\R{T} P^{-1} B \\ \hat{\theta} & = & ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} ( A^\R{T} V^{-1} y + B^\R{T} P^{-1} z ) \\ \hat{\theta} & = & ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} ( A^\R{T} V^{-1} A \bar{\theta} + B^\R{T} P^{-1} z ) + ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} A^\R{T} V^{-1} e \end{eqnarray}

The first term is deterministic and the second term is mean zero. It follows that

\begin{eqnarray} \B{E} [ \hat{\theta} ] & = & ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} ( A^\R{T} V^{-1} A \bar{\theta} + B^\R{T} P^{-1} z ) \\ \B{E} [ ( \hat{\theta} - \B{E} [ \hat{\theta} ] ) ( \hat{\theta} - \B{E} [ \hat{\theta} ] )^\R{T} ] & = & ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} A^\R{T} V^{-1} A ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} \end{eqnarray}

Since the matrix \(B^\R{T} P^{-1} B\) is positive definite, we have

\[A^\R{T} V^{-1} A \prec A^\R{T} V^{-1} A + B^\R{T} P^{-1} B\]

Replacing \(A^\R{T} V^{-1} A\) by \(A^\R{T} V^{-1} A + B^\R{T} P^{-1} B\) in the center of the previous expression for the variance of \(\hat{\theta}\) we obtain

\[\B{E} [ ( \hat{\theta} - \B{E} [ \hat{\theta} ] ) ( \hat{\theta} - \B{E} [ \hat{\theta} ] )^\R{T} ] \prec ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1}\]

This completes the proof of this lemma.

Simulation#

Suppose we simulate date \(y \sim \B{N}( A \bar{\theta}, V)\) and independent prior values \(z \sim \B{N}( B \bar{\theta}, P)\) were \(A\), \(V\) are as in Lemma 1 and \(B\), \(P\) are as in Lemma 2 . Furthermore, we define \(\hat{\theta}\) as the maximizer of

\[\B{p}( y, z | \theta ) = \B{p} ( y | \theta ) \B{p} ( z | \theta )\]

We define \(w \in \B{R}^{(m + n) \times 1}\), \(C \in \B{R}^{ (m + n) \times n}\), and \(U \in \B{R}^{ (m + n) \times (m + n)}\) by

\[\begin{split}w = \left( \begin{array}{c} y \\ z \end{array} \right) \W{,} C = \left( \begin{array}{cc} A & 0 \\ 0 & B \end{array} \right) \W{,} U = \left( \begin{array}{cc} V & 0 \\ 0 & P \end{array} \right)\end{split}\]

We can now apply Lemma 1 with \(y\) replaced by \(w\), \(A\) replaced by \(C\) and \(V\) replaced by \(U\). It follows from Lemma 1 that \(\B{E} [ \hat{\theta} ] = \bar{\theta}\) and

\[\B{E} [ ( \hat{\theta} - \bar{\theta} ) ( \hat{\theta} - \bar{\theta} )^\R{T} ] = ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1}\]

which is the proper posterior variance for the case in Lemma 2.