data_table#

The Data Table#

Discussion#

Each row of the data table corresponds to one measurement; see meas_value below.

data_id#

This column has type integer and is the primary key for the data table. Its initial value is zero, and it increments by one for each row.

data_name#

This column has type text and has a different value for every row; i.e., the names are unique and can act as substitutes for the primary key. The names are intended to be easier for a human to remember than the ids.

integrand_id#

This column has type integer and is the integrand_id that identifies the integrand for this measurement.

density_id#

This column has type integer and is the density_id that identifies the density function for the measurement nose. The density_name corresponding to density_id cannot be uniform . (Use hold_out to ignore data during fitting.) This density may be replaced using the data_density_command .

Nonsmooth#

If the density is Nonsmooth , the average_integrand cannot depend on any of the random effects. For example, if the node_id is the parent_node_id , the average integrand does not depend on the random effects. Also, if the node corresponding to a child that has all its random effects constrained, the average integrand does not depend on the random effects. Each nonsmooth data point adds a hidden variable to the optimization problem (that is the max of the residual and its negative). Having a lot of these variables slows down the optimization.

node_id#

This column has type integer and is the node_id that identifies the node for this measurement.

Parent Data#

If the node_id is the parent_node_id , this data will be associated with the parent node and not have any random effects in its model.

Child Data#

If the node_id is a child of the parent node, or a Descendant of a child, the data will be associated with the random effects for that child. In this case density_id cannot correspond to laplace or log_laplace . The corresponding densities would not be differentiable at zero and the Laplace approximation would not make sense in this case.

subgroup_id#

This column has type integer and is the subgroup_id that this data point corresponds to.

group_id#

The automatically is a group_id corresponding to the subgroup_id even though the group_id does not appear in the data file (if it does appear, it will not be used).

Nonsmooth#

If the density is Nonsmooth , the subgroup_smooth_id corresponding to subgroup_id must be null.

weight_id#

This column has type integer and is the weight_id that identifies the weighting used for this measurement. If weight_id is nu

null#

If weight_id is null , the constant weighting is used for this data point.

hold_out#

This column has type integer and has value zero or one. Only the rows where hold_out is zero are included in the objective optimized during a fit_command . See the fit command hold_out documentation.

meas_value#

This column has type real and is the measured value for each row of the data table; i.e., the measurement of the integrand, node, etc.

meas_std#

This column has type real , has same units at the data. It must be positive unless the density is binomial in which case it must be null.

This is not the only contribution to the standard deviation used in the data likelihood; see minimum cv standard deviation \(\Delta\), adjusted standard deviation sigma \(\sigma\), and transformed standard deviation delta \(\delta ( \theta )\).

eta#

This column has type real . If density_id corresponds to a log scaled density , eta must be greater than or equal zero and is the offset in the log transformation for this data point; see log scaled case definition of the weighted residual function . This offset may be replaced using the data_density_command .

null#

If density_id does not correspond to log_gaussian , log_laplace , or log_students , eta can be null .

nu#

This column has type real . If density_id corresponds to students or log_students , nu must be greater than two and is number of degrees of freedom in the distribution for this point; see the definition of the log-density for Student’s-t and Log-Student’s-t . The degrees of freedom may be replaced using the data_density_command .

null#

If density_id does not correspond to students or log_students , nu can be null .

sample_size#

This is the number of samples for a binomial distribution. If the corresponding density is not binomial, sample_size must be null.

age_lower#

This column has type real and is the lower age limit for this measurement. It must be greater than or equal the minimum age_table value.

age_upper#

This column has type real and is the upper age limit for this measurement. It must be greater than or equal the corresponding age_lower and less than or equal the maximum age_table value.

time_lower#

This column has type real and is the lower time limit for this measurement. It must be greater than or equal the minimum time_table value.

time_upper#

This column has type real and is the upper time limit for this measurement. It must be greater than or equal the corresponding time_lower and less than or equal the maximum time_table value.

Covariates#

The covariate columns have type real and column names that begin with the two characters x_ . For each valid covariate_id , column x_ covariate_id contains the value, for this measurement, of the covariate specified by covariate_id .

Null#

The covariate value null is interpreted as the reference value for the corresponding covariate.

Example#

The file data_table.py create example data tables.