\(\newcommand{\B}[1]{ {\bf #1} }\) \(\newcommand{\R}[1]{ {\rm #1} }\) \(\newcommand{\W}[1]{ \; #1 \; }\)
posterior#
View page sourceSimulating Posterior Distribution for Model Variables#
Purpose#
The sample
command with method equal to
simulate
fits simulated random measurements data_sim_table
and simulated random priors prior_sim_table .
The Lemmas on this page prove that, in the linear Gaussian case,
this gives the proper statistics for the posterior distribution
of the maximum likelihood estimate.
Note that the dismod_at case is not linear nor are all the
statistics Gaussian.
Lemma 1#
Suppose we are given a matrix \(A \in \B{R}^{m \times n}\), a positive definite matrix, \(V \in \B{R}^{m \times m}\) and a \(y \in \B{R}^{m \times 1}\). Further suppose that there is an unknown vector \(\bar{\theta} \in \B{R}^{n \times 1}\) such that \(y \sim \B{N} ( A \bar{\theta} , V )\). The maximum likelihood estimator \(\hat{\theta}\) for \(\bar{\theta}\) given \(y\) has mean \(\B{E} [ \hat{\theta} ] = \bar{\theta}\) and variance
Proof#
The probability density for \(y\) given \(\theta\) is
Dropping the determinant term, because it does not depend on \(\theta\), and taking the negative log we get the objective
and the equivalent problem to minimize \(f ( \theta )\) with respect to \(\theta \in \B{R}^{n \times 1}\). The derivative \(f^{(1)} ( \theta ) \in \B{R}^{1 \times n}\) is given by
It follows that
This completes the proof of the equation for the second partial of \(\B{p} ( y | \theta )\) in the statement of the lemma.
The maximum likelihood estimate \(\hat{\theta}\) satisfies the equation \(f^{(1)} ( \hat{\theta} ) = 0\); i.e.,
Defining \(e = y - A \bar{\theta}\), we have \(\B{E} [ e ] = 0\) and
This expresses the estimate \(\hat{\theta}\) as a deterministic function of the noise \(e\). It follows from the last equation for \(\hat{\theta}\) above, and the fact that \(\B{E} [ e ] = 0\), that \(\B{E} [ \hat{\theta} ] = \bar{\theta}\). This completes the proof of the equation for the expected value of \(\hat{\theta}\) in the statement of the lemma.
It also follows, from the equation for \(\hat{\theta}\) above, that
This completes the proof of the equation for the covariance of \(\hat{\theta} - \bar{\theta}\) in the statement of the lemma.
Remark#
For the case in Lemma 1 , the second partial of \(\log \B{p} ( y | \theta )\) with respect to \(\theta\) does not depend on \(\theta\) and \(A V^{-1} A\) is the information matrix.
Lemma 2#
Suppose that in addition to all the information in Lemma 1 we have a matrix \(B \in \B{R}^{p \times n}\), a positive definite matrix, \(P \in \B{R}^{p \times p}\), and \(z \in \B{R}^{p \times 1}\) where we have independent prior information \(B \theta \sim \B{N}( z , P )\). Further suppose \(B\) has rank \(n\). For this case we define \(\hat{\theta}\) as the maximizer of \(\B{p}( y | \theta ) \B{p}( \theta )\). It follows that
where \(\prec\) means less than in a positive definite matrix sense.
Remark#
The posterior distribution for the maximum likelihood estimate, when including a prior, cannot be sampled by fitting simulated data alone. To see this, consider the case where column one of the matrix \(A\) is zero. In this case, that data \(y\) does not depend on \(\theta_1\) and \(\hat{\theta}_1 = \bar{\theta}_1\) no matter what the value of \(y\). On the other hand, the posterior distribution for \(\theta_1\), for this case, is the same as its prior distribution and has uncertainty.
Proof#
The derivative of the corresponding negative log likelihood is
From this point, the proof of the equation for the second partial is very similar to Lemma 1 and left to the reader.
Setting the derivative to zero, we get the corresponding maximum likelihood estimate \(\hat{\theta}\) satisfies
The first term is deterministic and the second term is mean zero. It follows that
Since the matrix \(B^\R{T} P^{-1} B\) is positive definite, we have
Replacing \(A^\R{T} V^{-1} A\) by \(A^\R{T} V^{-1} A + B^\R{T} P^{-1} B\) in the center of the previous expression for the variance of \(\hat{\theta}\) we obtain
This completes the proof of this lemma.
Simulation#
Suppose we simulate date \(y \sim \B{N}( A \bar{\theta}, V)\) and independent prior values \(z \sim \B{N}( B \bar{\theta}, P)\) were \(A\), \(V\) are as in Lemma 1 and \(B\), \(P\) are as in Lemma 2 . Furthermore, we define \(\hat{\theta}\) as the maximizer of
We define \(w \in \B{R}^{(m + n) \times 1}\), \(C \in \B{R}^{ (m + n) \times n}\), and \(U \in \B{R}^{ (m + n) \times (m + n)}\) by
We can now apply Lemma 1 with \(y\) replaced by \(w\), \(A\) replaced by \(C\) and \(V\) replaced by \(U\). It follows from Lemma 1 that \(\B{E} [ \hat{\theta} ] = \bar{\theta}\) and
which is the proper posterior variance for the case in Lemma 2.