wish_list#

Dismod_at Wish List#

Command Documentation#

There should be one section in the documentation for all the dismod_at commands and it should split into those commands that use dismod_at and those that use dismodat.py .
There does not seem to be any documentation for the dismodat.py plot command. This should be fixed.

Bootstrapping#

Sampling using the simulate method has the following advantages:

It does not have the asymptotic problem of failing when the information matrix is singular.
It takes all the constraints into account when computing the posterior distribution.

This sampling method is difficult to use because it has the following problem: The model for the variance of the data is often larger than the true variance (extra data variance results in extra variance in the posterior predictions and is often used to account for model misspecification). This often causes the fitting of the simulated data to fail.

Bootstrapping samples from the actual data, instead of using simulated data, would have the advantages without the problem mentioned above.

Batch Command#

Add a batch command that combines multiple commands in a file. In some cases, dismod_at would not need to re-initialize and could save a considerable amount of time. For example:

When executing the hold_out_command for multiple integrands.
When executing a sample_command or predict_command after a fit_command .

Empty Tables#

Change create_database so that uses the empty list as the default value for every argument. In addition, change the routines that read the input tables to return an empty vector when a table is not in the database.

Parallel Testing#

It would be good to changes the dismod_at test so that they did not use the same file name (e.g., example.db) and could be run in parallel.

Speed#

The CppADCodeGen package (and CppAD JIT) compiles CppAD code. While the compilation takes time, the resulting derivative evaluations are much faster; see speed-cppadcg. Perhaps it is possible to make certain calculations; e.g., the ODE solution an CppAD atomic function that is evaluated using CppADCodeGen or CppAD JIT.

Bound Constraints#

Currently the user must convert active bound constraints to equality constraints to get asymptotic statistics for the fixed effects, because the Hessian of the objective is often not positive definite. Automatically detect when these bound constraints are active at the solution and treat the corresponding variables as equality constraints during the asymptotic statistics calculation.
Document how the bounds affect the priors and posterior distributions during the fit, simulate, and sample commands. Note that there is some discussion of bounds in the Simulating Priors wish list item below.

Laplace Random Effect#

The marginal likelihood for Laplace random effects is smooth, because the model for the data is smooth w.r.t. the random effects. Hence, it is possible to extend dismod_at to include Laplace random effect. In addition, the random effects in dismod_at are not correlated. It follows that, the integral of Gaussian random effects (Laplace random effects) can be evaluated using the error function (exponential function). Thus, we could include bounds on the random effect.

Immunity#

It would be possible to include an immunity state \(I\) and have \(\rho\) the rate at which one gets curred from the disease, and does not get it again. There would be a switch that one chooses to say that onces you are curred you become immune. This would require changing the definition of some of the integrands; e.g., prevalence would be \(C / (S + C + I)\). It would also require changing the ODE, but in a way that should be faster because there is no feedback.

Simulating Priors#

The code for simulating values for the prior distribution prior_sim_value currently truncates the simulated values to be within the lower and lower limits for the distribution. Note that constraining an optimizer to be between lower and upper limits corresponds to truncated distribution. The mean for a truncated distribution need not lie between the lower and upper limits. We should remove the truncation in the simulated prior values and we should remove the restriction that the prior table mean needs to be between the lower and upper limits. This will require projecting onto the lower and upper limits when the prior mean is used in the start_var_table or scale_var_table .

Multi-Threading#

On a shared-memory system, it should be possible to split the data into subsets and evaluate the corresponding likelihood terms using a different thread for each subset. The function, and derivative values, corresponding to each thread would then be summed to get the value corresponding to the entire likelihood. This should be done for the initialization as well as for the function and derivative evaluation during optimization. The execution time for problems with large amounts of data should be divided by a number close to the number of cores available on the system.

User Examples#

The user_example examples below Examples With Explanation have a discussion at the top of each example. Add a discussion for the other the user examples.

meas_std#

Currently the data table meas_std must be specified (except for uniform density). Perhaps we should allow for this standard deviation to be null in the case when the corresponding meas_value must not be zero and the minimum_meas_cv would be used to determine the measurement accuracy.

Lagrange Multipliers#

Change the Lagrange multipliers lagrange_dage (dtime) in the fit_var table to be null when there is no corresponding age (time) different; i.e., at the maximum age (time). (Currently these Lagrange multipliers are zero.)

Censored Laplace#

The censored density formula for a Gaussian G(y,mu,delta,c) is correct even if \(c > \mu\). On the other hand, the formula for the Laplace case L(y,mu,delta,c) requires \(c \leq \mu\). The Laplace case can be extended using the fact that it is symmetric, integrating from \(\mu\) to \(c\), using absolute values for the integration limits, and the Sign function. This will result in a non-smooth the optimization problem. Perhaps the problem can be reformulated with auxillary variables to be a smooth problem ?

ODE Solution#

Prevalence ODE#

If \(S\) and \(C\) satisfy the dismod_at Ordinary Differential Equation then prevalence \(P = C / (S + C)\) satisfies

\[P' = \iota [ 1 - P ] - \rho P - \chi [ 1 - P] P\]

We can therefore solve for prevalence without other cause mortality \(\omega\) or all cause mortality \(\omega + \chi P\).

The ODE for \(P\) is non-linear, while the ODE is \((S, C)\) is linear.
All of the current integrands, except for susceptible and withC can be computed from \(P\) (given that the rates are inputs to the ODE).
If we know all cause mortality \(\alpha = \omega + \chi P\), once we have solved for \(P\), we can compute \(\omega = \alpha - \chi P\). Furthermore

\[(S + C)' = - \alpha (S + C)\]

So we can also compute \(S + C\), and \(C = P (S + C)\),
Given the original ODE, we know that the true solution for \(S\), must be positive, and \(C\), \(P\) must be non-negative. Negative values for these quantities will correspond to numerical precision errors in the solution of the ODE.
One advantage of this approach, over the current approach of solving the ODE in \((S, C)\), is that the solution is stable as \(S + C \rightarrow 0\). (The current approach computes \(P = C / (S + C)\).

Large Excess Mortality#

If case where rate_case is iota_pos_rho_zero corresponds to dev::eigen_ode2::Method::Case Three in the ODE solver. If excess mortality \(\chi\) is unreasonably large, this can result in exponential overflow and infinity or nan. It is possible to redo the calculations in case three to properly handle this condition.

rate_case#

It is now possible to use conditional expressions in the ODE solution (CppAD this will now work properly these conditionals and two levels of AD and revere mode). This change would remove the need for the rate_case option. Note that this will also work with checkpointing.

Command Diagrams#

It would be good to give a data flow diagram for each command that corresponds to its extra inputs tables and output tables .

Real World Example#

It would be good to include a real world example. Since this is an open source program, we would need a data set that could be made distributed freely without any restriction on its use.

Random Starting Point#

Have an option to start at a random point from the prior for the fixed effects (instead of the mean of the fixed effects). This would better detect local minima and represent solution uncertainty.

Windows Install#

Make and test a set of Windows install instructions for dismod_at .