\(\newcommand{\B}[1]{ {\bf #1} }\) \(\newcommand{\R}[1]{ {\rm #1} }\) \(\newcommand{\W}[1]{ \; #1 \; }\)
hold_out_command¶
View page sourceHold Out Command: Randomly Sub-sample The Data¶
Syntax¶
dismod_at database hold_out integrand_name max_fitdismod_at database hold_out integrand_name max_fit max_fit_parentdismod_at database hold_out integrand_name max_fit \dismod_at database hold_out integrand_name max_fit max_fit_parent \Purpose¶
This command is used to set a maximum number of data values that are included in subsequent fits. It is intended to make the initialization and fitting faster. The random choice of which values to include can be made repeatable using random_seed .
database¶
Is an
sqlite database containing the
dismod_at input tables which are not modified.
integrand_name¶
This is the integrand that we are sub-sampling.
max_fit¶
If this argument is present, it is the maximum number of data points to fit for the specified integrand; i.e., the maximum number that are not held out. If for this integrand there are more than max_fit points with hold_out zero in the data table, points are randomly held out so that there are max_fit points fit for this integrand.
max_fit_parent¶
If this argument is present, max_fit only applies to the total data from child nodes. The value max_fit_parent determines the maximum number of Parent Node data values to include.
cov_name¶
If this argument is present, it specifies a covariate column that will be balanced; see covariate balancing below:
cov_value_1¶
If this argument is present, it specifies one of the covariate values
for the balancing. This is a string representation of a double value.
cov_value_2¶
If this argument is present, it specifies the opposite covariate value
for the balancing. This is a string representation of a double value.
Balancing¶
Child Nodes¶
The choice of which points to include in the fit tries to sample the same number of data points from each of the child nodes (and the parent node). If there are not sufficiently many data for one of these nodes, the others make up the difference.
Covariates¶
If cov_name is present, the data for each child is further split into those with cov_value_1, those with cov_value_2, and those with a different value (for the covariate specified by cov_name ). The choice of which points to include tries to sample the same number points form each of these sub-groups.
data_subset_table¶
Only rows of the data_subset_table that correspond to this integrand are modified. The hold_out is set one (zero) if the corresponding data is (is not) selected for hold out. Only points that have hold_out zero in the data table can have hold_out non-zero in the data_subset table. See the fit command hold_out documentation.
Example¶
The files user_hold_out_1.py and user_hold_out_2.py contain examples and tests using this command.