\(\newcommand{\B}[1]{ {\bf #1} }\) \(\newcommand{\R}[1]{ {\rm #1} }\) \(\newcommand{\W}[1]{ \; #1 \; }\)
data_table#
View page sourceThe Data Table#
Discussion#
Each row of the data table corresponds to one measurement; see meas_value below.
data_id#
This column has type integer
and is the primary key for the
data
table.
Its initial value is zero, and it increments by one for each row.
data_name#
This column has type text
and has a different value for every row;
i.e., the names are unique and can act as substitutes for the primary key.
The names are intended to be easier for a human to remember than the ids.
integrand_id#
This column has type integer
and is the
integrand_id that identifies
the integrand for this measurement.
density_id#
This column has type integer
and is the
density_id that identifies
the density function for the measurement nose.
The density_name
corresponding to density_id cannot be uniform
.
(Use hold_out to ignore data during fitting.)
This density may be replaced using the data_density_command .
Nonsmooth#
If the density is Nonsmooth , the average_integrand cannot depend on any of the random effects. For example, if the node_id is the parent_node_id , the average integrand does not depend on the random effects. Also, if the node corresponding to a child that has all its random effects constrained, the average integrand does not depend on the random effects. Each nonsmooth data point adds a hidden variable to the optimization problem (that is the max of the residual and its negative). Having a lot of these variables slows down the optimization.
node_id#
This column has type integer
and is the
node_id that identifies
the node for this measurement.
Parent Data#
If the node_id is the parent_node_id , this data will be associated with the parent node and not have any random effects in its model.
Child Data#
If the node_id is a child of the parent node, or a Descendant of a child, the data will be associated with the random effects for that child. In this case density_id cannot correspond to laplace or log_laplace . The corresponding densities would not be differentiable at zero and the Laplace approximation would not make sense in this case.
subgroup_id#
This column has type integer
and is the
subgroup_id
that this data point corresponds to.
group_id#
The automatically is a group_id corresponding to the subgroup_id even though the group_id does not appear in the data file (if it does appear, it will not be used).
Nonsmooth#
If the density is Nonsmooth , the subgroup_smooth_id corresponding to subgroup_id must be null.
weight_id#
This column has type integer
and is the
weight_id that identifies
the weighting used for this measurement.
If weight_id is nu
null#
If weight_id is null
,
the constant weighting is used for this data point.
hold_out#
This column has type integer
and has value zero or one.
Only the rows where hold_out is zero are included
in the objective optimized during a fit_command .
See the fit command hold_out
documentation.
meas_value#
This column has type real
and is the measured value
for each row of the data
table;
i.e., the measurement of the integrand, node, etc.
meas_std#
This column has type real
,
has same units at the data.
It must be positive unless the density is
binomial in which case it must be null.
This is not the only contribution to the standard deviation used in the data likelihood; see minimum cv standard deviation \(\Delta\), adjusted standard deviation sigma \(\sigma\), and transformed standard deviation delta \(\delta ( \theta )\).
eta#
This column has type real
.
If density_id corresponds to a
log scaled density ,
eta must be greater than or equal zero and is
the offset in the log transformation for this data point; see
log scaled case definition of the
weighted residual function .
This offset may be replaced using the data_density_command .
null#
If density_id does not correspond to
log_gaussian
, log_laplace
, or log_students
,
eta can be null
.
nu#
This column has type real
.
If density_id corresponds to
students
or log_students
,
nu must be greater than two and is
number of degrees of freedom in the distribution for this point; see
the definition of the log-density for
Student’s-t and
Log-Student’s-t .
The degrees of freedom may be replaced using the
data_density_command .
null#
If density_id does not correspond to
students
or log_students
,
nu can be null
.
sample_size#
This is the number of samples for a binomial distribution. If the corresponding density is not binomial, sample_size must be null.
age_lower#
This column has type real
and is the lower age limit
for this measurement.
It must be greater than or equal the minimum age_table value.
age_upper#
This column has type real
and is the upper age limit
for this measurement.
It must be greater than or equal the corresponding age_lower
and less than or equal the maximum age_table value.
time_lower#
This column has type real
and is the lower time limit
for this measurement.
It must be greater than or equal the minimum time_table value.
time_upper#
This column has type real
and is the upper time limit
for this measurement.
It must be greater than or equal the corresponding time_lower
and less than or equal the maximum time_table value.
Covariates#
The covariate columns have type real
and column names
that begin with the two characters x_
.
For each valid covariate_id ,
column x_
covariate_id contains
the value, for this measurement, of the covariate specified by
covariate_id .
Null#
The covariate value null
is interpreted as the
reference value for
the corresponding covariate.
Example#
The file data_table.py
create example data
tables.