----------------------------------------------------- lines 503-608 of file: devel/cmd/hold_out_command.cpp ----------------------------------------------------- {xrst_begin hold_out_command} {xrst_spell } Hold Out Command: Randomly Sub-sample The Data ############################################## Syntax ****** .. |first4| replace:: ``dismod_at`` *database* ``hold_out`` *integrand_name* | |first4| *max_fit* | |first4| *max_fit* *max_fit_parent* | |first4| *max_fit* |space| \\ | |tab| *cov_name* *cov_value_1* *cov_value_2* | |first4| *max_fit* *max_fit_parent* |space| \\ | |tab| *cov_name* *cov_value_1* *cov_value_2* Purpose ******* This command is used to set a maximum number of data values that are included in subsequent fits. It is intended to make the initialization and fitting faster. The random choice of which values to include can be made repeatable using :ref:`option_table@random_seed` . database ******** Is an `sqlite `_ database containing the ``dismod_at`` :ref:`input-name` tables which are not modified. integrand_name ************** This is the :ref:`integrand` that we are sub-sampling. max_fit ******* If this argument is present, it is the maximum number of data points to fit for the specified integrand; i.e., the maximum number that are not held out. If for this integrand there are more than *max_fit* points with :ref:`data_table@hold_out` zero in the data table, points are randomly held out so that there are *max_fit* points fit for this integrand. max_fit_parent ************** If this argument is present, *max_fit* only applies to the total data from child nodes. The value *max_fit_parent* determines the maximum number of :ref:`option_table@Parent Node` data values to include. cov_name ******** If this argument is present, it specifies a covariate column that will be balanced; see covariate balancing below: cov_value_1 *********** If this argument is present, it specifies one of the covariate values for the balancing. This is a string representation of a ``double`` value. cov_value_2 *********** If this argument is present, it specifies the opposite covariate value for the balancing. This is a string representation of a ``double`` value. Balancing ********* Child Nodes =========== The choice of which points to include in the fit tries to sample the same number of data points from each of the child nodes (and the parent node). If there are not sufficiently many data for one of these nodes, the others make up the difference. Covariates ========== If *cov_name* is present, the data for each child is further split into those with *cov_value_1*, those with *cov_value_2*, and those with a different value (for the covariate specified by *cov_name* ). The choice of which points to include tries to sample the same number points form each of these sub-groups. data_subset_table ***************** Only rows of the :ref:`data_subset_table-name` that correspond to this integrand are modified. The :ref:`data_subset_table@hold_out` is set one (zero) if the corresponding data is (is not) selected for hold out. Only points that have *hold_out* zero in the data table can have hold_out non-zero in the data_subset table. See the fit command :ref:`fit_command@hold_out` documentation. Example ******* The files :ref:`user_hold_out_1.py-name` and :ref:`user_hold_out_2.py-name` contain examples and tests using this command. {xrst_end hold_out_command}