---------------------------------------------- lines 5-333 of file: xrst/model/posterior.xrst ---------------------------------------------- {xrst_begin posterior} {xrst_spell cc } Simulating Posterior Distribution for Model Variables ##################################################### Purpose ******* The ``sample`` command with method equal to :ref:`sample_command@simulate` fits simulated random measurements :ref:`data_sim_table-name` and simulated random priors :ref:`prior_sim_table-name` . The Lemmas on this page prove that, in the linear Gaussian case, this gives the proper statistics for the posterior distribution of the maximum likelihood estimate. Note that the dismod_at case is not linear nor are all the statistics Gaussian. Lemma 1 ******* Suppose we are given a matrix :math:`A \in \B{R}^{m \times n}`, a positive definite matrix, :math:`V \in \B{R}^{m \times m}` and a :math:`y \in \B{R}^{m \times 1}`. Further suppose that there is an unknown vector :math:`\bar{\theta} \in \B{R}^{n \times 1}` such that :math:`y \sim \B{N} ( A \bar{\theta} , V )`. The maximum likelihood estimator :math:`\hat{\theta}` for :math:`\bar{\theta}` given :math:`y` has mean :math:`\B{E} [ \hat{\theta} ] = \bar{\theta}` and variance .. math:: \B{E} [ ( \hat{\theta} - \bar{\theta} ) ( \hat{\theta} - \bar{\theta} )^\R{T} ] = ( A^\R{T} V^{-1} A )^{-1} = - \; \left( \frac{ \partial^2 } { \partial \theta^2 } \log \B{p} ( y | \theta ) \right)^{-1} Proof ===== The probability density for :math:`y` given :math:`\theta` is .. math:: \B{p} ( y | \theta ) = \det ( 2 \pi V )^{-1/2} \exp \left[ - \frac{1}{2} ( y - A \theta )^\R{T} V^{-1} ( y - A \theta ) \right] Dropping the determinant term, because it does not depend on :math:`\theta`, and taking the negative log we get the objective .. math:: f ( \theta ) = \frac{1}{2} ( y - A \theta )^\R{T} V^{-1} ( y - A \theta ) and the equivalent problem to minimize :math:`f ( \theta )` with respect to :math:`\theta \in \B{R}^{n \times 1}`. The derivative :math:`f^{(1)} ( \theta ) \in \B{R}^{1 \times n}` is given by .. math:: f^{(1)} ( \theta ) = - ( y - A \theta )^\R{T} V^{-1} A It follows that .. math:: - \frac{ \partial^2 } { \partial \theta^2 } \log \B{p} ( y | \theta ) = f^{(2)} ( \theta ) = A^\R{T} V^{-1} A This completes the proof of the equation for the second partial of :math:`\B{p} ( y | \theta )` in the statement of the lemma. The maximum likelihood estimate :math:`\hat{\theta}` satisfies the equation :math:`f^{(1)} ( \hat{\theta} ) = 0`; i.e., .. math:: :nowrap: \begin{eqnarray} 0 & = & - ( y - A \hat{\theta} )^\R{T} V^{-1} A \\ A^\R{T} V^{-1} y & = & A^\R{T} V^{-1} A \hat{\theta} \\ \hat{\theta} & = & ( A^\R{T} V^{-1} A )^{-1} A^\R{T} V^{-1} y \end{eqnarray} Defining :math:`e = y - A \bar{\theta}`, we have :math:`\B{E} [ e ] = 0` and .. math:: :nowrap: \begin{eqnarray} \hat{\theta} & = & ( A^\R{T} V^{-1} A )^{-1} A^\R{T} V^{-1} ( A \bar{\theta} + e ) \\ \hat{\theta} & = & \bar{\theta} + ( A^\R{T} V^{-1} A )^{-1} A^\R{T} V^{-1} e \end{eqnarray} This expresses the estimate :math:`\hat{\theta}` as a deterministic function of the noise :math:`e`. It follows from the last equation for :math:`\hat{\theta}` above, and the fact that :math:`\B{E} [ e ] = 0`, that :math:`\B{E} [ \hat{\theta} ] = \bar{\theta}`. This completes the proof of the equation for the expected value of :math:`\hat{\theta}` in the statement of the lemma. It also follows, from the equation for :math:`\hat{\theta}` above, that .. math:: :nowrap: \begin{eqnarray} ( \hat{\theta} - \bar{\theta} ) ( \hat{\theta} - \bar{\theta} )^\R{T} & = & ( A^\R{T} V^{-1} A )^{-1} A^\R{T} V^{-1} e e^\R{T} V^{-1} A ( A^\R{T} V^{-1} A )^{-1} \\ \B{E} [ ( \hat{\theta} - \bar{\theta} ) ( \hat{\theta} - \bar{\theta} )^\R{T} ] & = & ( A^\R{T} V^{-1} A )^{-1} \end{eqnarray} This completes the proof of the equation for the covariance of :math:`\hat{\theta} - \bar{\theta}` in the statement of the lemma. Remark ====== For the case in :ref:`posterior@Lemma 1` , the second partial of :math:`\log \B{p} ( y | \theta )` with respect to :math:`\theta` does not depend on :math:`\theta` and :math:`A V^{-1} A` is the information matrix. Lemma 2 ******* Suppose that in addition to all the information in :ref:`posterior@Lemma 1` we have a matrix :math:`B \in \B{R}^{p \times n}`, a positive definite matrix, :math:`P \in \B{R}^{p \times p}`, and :math:`z \in \B{R}^{p \times 1}` where we have independent prior information :math:`B \theta \sim \B{N}( z , P )`. Further suppose :math:`B` has rank :math:`n`. For this case we define :math:`\hat{\theta}` as the maximizer of :math:`\B{p}( y | \theta ) \B{p}( \theta )`. It follows that .. math:: \B{E} [ ( \hat{\theta} - \B{E}[ \hat{\theta} ] ) ( \hat{\theta} - \B{E}[ \hat{\theta} ] )^\R{T} \prec ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} = - \; \left( \frac{ \partial^2 } { \partial \theta^2 } \log [ \B{p} ( y | \theta ) \B{p} ( \theta ) ] \right)^{-1} where :math:`\prec` means less than in a positive definite matrix sense. Remark ====== The posterior distribution for the maximum likelihood estimate, when including a prior, cannot be sampled by fitting simulated data alone. To see this, consider the case where column one of the matrix :math:`A` is zero. In this case, that data :math:`y` does not depend on :math:`\theta_1` and :math:`\hat{\theta}_1 = \bar{\theta}_1` no matter what the value of :math:`y`. On the other hand, the posterior distribution for :math:`\theta_1`, for this case, is the same as its prior distribution and has uncertainty. Proof ===== .. math:: \B{p} ( y | \theta ) \B{p} ( \theta ) = \det ( 2 \pi V )^{-1/2} \det ( 2 \pi P )^{-1/2} \exp \left[ - \frac{1}{2} ( y - A \theta )^\R{T} V^{-1} ( y - A \theta ) - \frac{1}{2} ( B \theta - z )^\R{T} P^{-1} ( B \theta - z ) \right] The derivative of the corresponding negative log likelihood is .. math:: - ( y - A \theta )^\R{T} V^{-1} A + ( B \theta - z )^\R{T} P^{-1} B From this point, the proof of the equation for the second partial is very similar to Lemma 1 and left to the reader. Setting the derivative to zero, we get the corresponding maximum likelihood estimate :math:`\hat{\theta}` satisfies .. math:: :nowrap: \begin{eqnarray} ( y - A \hat{\theta} )^\R{T} V^{-1} A & = & ( B \hat{\theta} - z )^\R{T} P^{-1} B \\ y^\R{T} V^{-1} A + z^\R{T} P^{-1} B & = & \hat{\theta}^\R{T} A^\R{T} V^{-1} A + \hat{\theta}^\R{T} B^\R{T} P^{-1} B \\ \hat{\theta} & = & ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} ( A^\R{T} V^{-1} y + B^\R{T} P^{-1} z ) \\ \hat{\theta} & = & ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} ( A^\R{T} V^{-1} A \bar{\theta} + B^\R{T} P^{-1} z ) + ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} A^\R{T} V^{-1} e \end{eqnarray} The first term is deterministic and the second term is mean zero. It follows that .. math:: :nowrap: \begin{eqnarray} \B{E} [ \hat{\theta} ] & = & ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} ( A^\R{T} V^{-1} A \bar{\theta} + B^\R{T} P^{-1} z ) \\ \B{E} [ ( \hat{\theta} - \B{E} [ \hat{\theta} ] ) ( \hat{\theta} - \B{E} [ \hat{\theta} ] )^\R{T} ] & = & ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} A^\R{T} V^{-1} A ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} \end{eqnarray} Since the matrix :math:`B^\R{T} P^{-1} B` is positive definite, we have .. math:: A^\R{T} V^{-1} A \prec A^\R{T} V^{-1} A + B^\R{T} P^{-1} B Replacing :math:`A^\R{T} V^{-1} A` by :math:`A^\R{T} V^{-1} A + B^\R{T} P^{-1} B` in the center of the previous expression for the variance of :math:`\hat{\theta}` we obtain .. math:: \B{E} [ ( \hat{\theta} - \B{E} [ \hat{\theta} ] ) ( \hat{\theta} - \B{E} [ \hat{\theta} ] )^\R{T} ] \prec ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} This completes the proof of this lemma. Simulation ********** Suppose we simulate date :math:`y \sim \B{N}( A \bar{\theta}, V)` and independent prior values :math:`z \sim \B{N}( B \bar{\theta}, P)` were :math:`A`, :math:`V` are as in :ref:`posterior@Lemma 1` and :math:`B`, :math:`P` are as in :ref:`posterior@Lemma 2` . Furthermore, we define :math:`\hat{\theta}` as the maximizer of .. math:: \B{p}( y, z | \theta ) = \B{p} ( y | \theta ) \B{p} ( z | \theta ) We define :math:`w \in \B{R}^{(m + n) \times 1}`, :math:`C \in \B{R}^{ (m + n) \times n}`, and :math:`U \in \B{R}^{ (m + n) \times (m + n)}` by .. math:: w = \left( \begin{array}{c} y \\ z \end{array} \right) \W{,} C = \left( \begin{array}{cc} A & 0 \\ 0 & B \end{array} \right) \W{,} U = \left( \begin{array}{cc} V & 0 \\ 0 & P \end{array} \right) We can now apply Lemma 1 with :math:`y` replaced by :math:`w`, :math:`A` replaced by :math:`C` and :math:`V` replaced by :math:`U`. It follows from Lemma 1 that :math:`\B{E} [ \hat{\theta} ] = \bar{\theta}` and .. math:: \B{E} [ ( \hat{\theta} - \bar{\theta} ) ( \hat{\theta} - \bar{\theta} )^\R{T} ] = ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} which is the proper posterior variance for the case in Lemma 2. {xrst_end posterior}