censor_density

View page source

The Censored Gaussian and Laplace Densities

References

See censoring and the heading Likelihoods for mixed continuous-discrete distributions on the wiki page for likelihood functions.

Discussion

We use \(\mu\) for the mean and \(\delta > 0\) for the standard deviation of a Gaussian or Laplace random variable \(y\). We use \(c \leq \mu\) for the value we are censoring the random variable at. The censored random variable is defined by

\[\begin{split}\underline{y} = \left\{ \begin{array}{ll} c & \R{if} \; y \leq c \\ y & \R{otherwise} \end{array} \right.\end{split}\]

The crucial property is that the censored density functions (defined below) are smooth function with respect to the mean value \(\mu\) (but not even continuous with respect to \(c\) or \(y\)).

Simulation Test

The file test/user/censor_density.py contains a test of maximum likelihood estimation using the continuous-discrete densities proposed below.

Gaussian

Density, G(y,mu,delta)

The Gaussian density function is given by

\[G( y , \mu , \delta ) = \sqrt{ \frac{1}{ 2 \pi \delta^2 } } \exp \left[ - \frac{1}{2} \left( \frac{y - \mu}{\delta} \right)^2 \right]\]

Error Function

The Error function is defined (for \(0 \leq x\)) by

\[\R{erf}(x) = \sqrt{ \frac{1}{\pi} } \int_{-x}^{+x} \exp \left( - t^2 \right) \; \R{d} t\]

Using he change of variables \(t = \sqrt{2}^{-1} (y - \mu) / \delta )\) we have \(y = \mu + t \delta \sqrt{2}\) and

\[\R{erf}(x) = \sqrt{ \frac{1}{2 \pi \delta^2} } \int_{\mu - x \delta \sqrt{2}}^{\mu + x \delta \sqrt{2}} \exp \left[ - \frac{1}{2} \left( \frac{y - \mu}{\delta} \right)^2 \right] \; \R{d} y\]

Setting \(x = \sqrt{2}^{-1} ( \mu - c ) / \delta\) we obtain

\[\R{erf}\left( \sqrt{2}^{-1} ( \mu - c ) / \delta \right) = \sqrt{ \frac{1}{2 \pi \delta^2} } \int_{c}^{2 \mu - c} \exp \left[ - \frac{1}{2} \left( \frac{y - \mu}{\delta} \right)^2 \right] \; \R{d} y\]

Note that this integral is negative when \(c > \mu\). The Gaussian density is symmetric about \(y = \mu\) and its integral from minus infinity to plus infinity is one. Hence

\[\frac{ 1 - \R{erf}\left( \sqrt{2}^{-1} ( \mu - c ) / \delta \right) }{2} = \sqrt{ \frac{1}{2 \pi \delta^2} } \int_{-\infty}^{c} \exp \left[ - \frac{1}{2} \left( \frac{y - \mu}{\delta} \right)^2 \right] \; \R{d} y\]

Censored Density, G(y,mu,delta,c)

The censored Gaussian density is defined by

\[\begin{split}G ( \underline{y} , \mu , \delta , c ) = \left\{ \begin{array}{ll} \left( 1 - \R{erf}\left( \sqrt{2}^{-1} (\mu - c) / \delta \right) \right) / 2 & \R{if} \; \underline{y} = c \\ G( \underline{y} , \mu , \delta ) & \R{otherwise} \end{array} \right.\end{split}\]

This density function is with respect to the Lebesgue measure plus an atom with mass one at \(\underline{y} = c\).

Difference Between Means

We use \(\overline{\underline{y}}\) to denote the mean after censoring the distribution:

\[\frac{ \overline{\underline{y}} - \mu }{ \delta } = \frac{c - \mu}{2 \delta } \left( 1 - \R{erf}\left( \sqrt{2}^{-1} (\mu - c) / \delta \right) \right) + \sqrt{ \frac{1}{ 2 \pi \delta^2 } } \int_c^{+\infty} \frac{y - \mu}{ \delta } \exp \left[ - \frac{1}{2} \left( \frac{y - \mu}{\delta} \right)^2 \right] \; \R{d} y\]
\[\frac{ \overline{\underline{y}} - \mu }{ \delta } = \frac{c - \mu}{2 \delta } \left( 1 - \R{erf}\left( \sqrt{2}^{-1} (\mu - c) / \delta \right) \right) - \sqrt{ \frac{1}{ 2 \pi \delta^2 } } \left[ \exp \left( - \frac{1}{2} \left[ \frac{y - \mu}{\delta} \right]^2 \right) \right]_c^{+\infty}\]
\[\overline{\underline{y}} - \mu = \frac{c - \mu}{2} \left( 1 - \R{erf}\left( \sqrt{2}^{-1} (\mu - c) / \delta \right) \right) + \sqrt{ \frac{1}{ 2 \pi } } \exp \left( - \frac{1}{2} \left[ \frac{c - \mu}{\delta} \right]^2 \right)\]

Laplace

Density, L(y,mu,delta)

The Laplace density function is given by

\[L( y , \mu , \delta ) = \sqrt{ \frac{1}{2 \delta^2 } } \exp \left[ - \sqrt{2} \left| \frac{y - \mu}{\delta} \right| \right]\]

Indefinite Integral

The indefinite integral with respect to \(y\), for \(x \leq \mu\), is

\[\int_{-\infty}^{x} L( y , \mu , \delta ) \; \R{d} y = \sqrt{ \frac{1}{2 \delta^2 } } \int_{-\infty}^{x} \exp \left( - \sqrt{2} \frac{\mu - y}{\delta} \right) \; \R{d} y\]

Using \(c \leq \mu\), we obtain

\[\int_{-\infty}^{c} L( y , \mu , \delta ) \; \R{d} y = \frac{1}{2} \exp \left( - \sqrt{2} \frac{\mu - c}{\delta} \right)\]

Censored Density, L(y,mu,delta,c)

The censored Laplace density is defined by

\[\begin{split}L ( \underline{y} , \mu , \delta , c ) = \left\{ \begin{array}{ll} (1 / 2 ) \exp \left( - ( \mu - c ) \sqrt{2} / \delta \right) & \R{if} \; \underline{y} = c \\ L( \underline{y} , \mu , \delta ) & \R{otherwise} \end{array} \right.\end{split}\]

This density function is with respect to the Lebesgue measure plus an atom with mass one at \(\underline{y} = c\).

Difference Between Means

We use \(\overline{\underline{y}}\) to denote the mean after censoring the distribution:

\[\frac{ \overline{\underline{y}} - \mu }{ \delta } = \frac{c - \mu}{2 \delta } \exp \left( - \sqrt{2} \frac{\mu - c}{\delta} \right) + \sqrt{ \frac{1}{2 \delta^2 } } \int_c^{+\infty} \frac{y - \mu}{\delta} \exp \left[ - \sqrt{2} \left| \frac{y - \mu}{\delta} \right| \right] \; \R{d} y\]

Using integration by parts, one can obtain a formula for \(\overline{\underline{y}} - \mu\) in a manner similar to calculation of the Difference Between Means for the Gaussian case.

Log Gaussian

Suppose that \(\log(y + \eta )\) has a Gaussian distribution with mean \(\log( \mu + \eta )\) and standard deviation \(\delta\) , and we are censoring the distribution at the value \(\log( c + \eta)\) . For this case

\[\begin{split}G( y , \mu , \delta ) & = \sqrt{ \frac{1}{ 2 \pi \delta^2 } } \exp \left[ - \frac{1}{2} \left( \frac{ \log(y + \eta) - \log( \mu + \eta) } {\delta} \right)^2 \right] \\ G ( \underline{y} , \mu , \delta , c ) & = \left\{ \begin{array}{ll} \left[ 1 - \R{erf}\left( \sqrt{2}^{-1} [ \log( \mu + \eta ) - \log( c + \eta )] / \delta \right) \right] / 2 & \R{if} \; \underline{y} = c \\ G( \underline{y} , \mu , \delta ) & \R{otherwise} \end{array} \right.\end{split}\]

Log Laplace

Suppose that \(\log(y + \eta )\) has a Laplace distribution with mean \(\log( \mu + \eta )\) and standard deviation \(\delta\) , and we are censoring the distribution at the value \(\log( c + \eta)\) . For this case

\[\begin{split}L( y , \mu , \delta ) & = \sqrt{ \frac{1}{2 \delta^2 } } \exp \left[ - \sqrt{2} \left| \frac{\log(y + \eta) - \log(\mu + \eta)}{\delta} \right| \right] \\ L ( \underline{y} , \mu , \delta , c ) & = \left\{ \begin{array}{ll} (1 / 2 ) \exp \left( - [ \log( \mu + \eta ) - \log( c + \eta) ] \sqrt{2} / \delta \right) & \R{if} \; \underline{y} = c \\ L( \underline{y} , \mu , \delta ) & \R{otherwise} \end{array} \right.\end{split}\]