db2csv_command

View page source

Create Csv Files that Summarize The Database

Syntax

As Program

dismod-at database db2csv

As Python Function

dismod_at.db2csv_command ( database )

Convention

The null value in the database corresponds to an empty string in the csv files.

log table

This command uses python_log_command to enter begin and end markers in the database log table.

database

is the path from the currently directory to the database. This must be a dismod_at and the init_command must have been run on the database.

dir

We use the notation dir for the directory where database is located.

fit_var, fit_data_subset

The log_table is used to determine if the previous fit command had a simulate_index . If so, the fit_var_table and fit_data_subset_table corresponds to simulated data. Otherwise, if they exist, the correspond to the measured data.

simulate_index

If the previous fit command had a simulate_index that value is used for simulate_index below. Otherwise, zero is used for simulate_index below.

option.csv

The file dir / option.csv is written by this command. It is a CSV file with one row for each possible row in the option_table . The columns in option.csv are option_name and option_value . If a row does not appear in the option table, the corresponding default value is written to option.csv . If the parent_node_id appears in the option table, the parent_node_name row of option.csv is filled in with the corresponding node name.

log.csv

The file dir / log.csv is written by this command. It is a CSV file with one row for each message in the log_table . The columns in this table are message_type , table_name , row_id , unix_time , and message . Note that a begin db2csv command will appear at the end of this file without the corresponding end db2csv because the db2csv command was not completed when log.csv was written.

age_avg.csv

The file dir / age_avg.csv is written by this command. It is a CSV file with the contents of the age_avg table. The only column in this table is age . Note that a set_command may change the value of ode_step_size or age_avg_split but it will not write out the new age_avg table.

hes_fixed.csv

If the asymptotic sample command was executed, the contents of the hes_fixed_table are written to the CSV file dir / hes_fixed.csv . The columns in this table are row_var_id , col_var_id , hes_fixed_value .

hes_random.csv

If a fit both , fit random , or sample asymptotic command was executed, the contents of the hes_random_table are written to the CSV file dir / hes_random.csv . The columns in this table are row_var_id , col_var_id , hes_random_value .

trace_fixed.csv

If the fit fixed or fit both command has completed, the contents of the trace_fixed_table are written to the CSV file dir / trace_fixed.csv . The columns in this table have the same name as in the corresponding table with the exception that the column regularization_size is called reg_size .

mixed_info.csv

If the fit_command completed the contents of the mixed_info_table are written to the CSV file dir / mixed_info.csv .

variable.csv

The file dir / variable.csv is written by this command. It is a CSV file with one row for each of the model_variables and has the following columns:

var_id

is the var_id .

var_type

is the var_type .

s_id

is the smooth_id for this variable. If the variable is a smoothing standard deviation multiplier this is the smoothing that this multiplier effects. Otherwise, it is the smoothing where the prior for this variable comes from.

m_id

If this variable is a covariate multiplier, this is the corresponding mulcov_id .

m_diff

If this variable is a covariate multiplier, this is the corresponding max_cov_diff .

bound

If the upper and lower value limits in the value prior for this variable are not equal, this is a bound for the absolute value of this variable; see max_mulcov and bound_random .

age

is the age .

time

is the time .

rate

is the rate_name .

integrand

is the integrand_name .

covariate

is the covariate_name .

node

is the node_name .

group

This field is non-empty for Group Covariate Multipliers .

subgroup

This field is non-empty for Subgroup Covariate Multipliers .

fixed

is true if this variable is a fixed effect , otherwise it is false .

depend

If the depend_var_table exists, this has one of the following: none if neither the data nor the prior depends on this variable, data if only the data depends on this variable, prior if only the prior depends on this variable, both if both the data and the prior depend on this variable.

fit_value

If the fit_command has been run, this is the fit_var_value .

start

is the start_var_value for this variable.

scale

is the scale_var_value for this variable.

truth

If the truth_var table exists, this is the truth_var_value for this variable.

sam_avg

If the sample table exists, for each var_id this is the average with respect to with respect to sample_index of the var_value corresponding to this var_id .

sam_std

If the sample table exists, for each fixed var_id this is the estimated standard deviation with respect to with respect to sample_index of the # var_value corresponding to this var_id . If there is only one sample_index in the sample table, this column is empty because the standard deviation cannot be estimated from one sample.

res_value

If the fit_command has been run, this is the residual_value .

res_dage

If the fit_command has been run, this is the residual_dage ; see fit_var above.

res_dtime

If the fit_command has been run, this is the residual_dtime ; see fit_var above.

lag_value

If the fit_command has been run, this is the lagrange_value ; see fit_var above.

lag_dage

If the fit_command has been run, this is the lagrange_dage ; see fit_var above.

lag_dtime

If the fit_command has been run, this is the lagrange_dtime ; see fit_var above.

sim_v, sim_a, sim_t

If the simulate_command has been run, these are the values of prior_sim_value , prior_sim_dage , and prior_sim_dtime , for the simulate_index .

prior_info

There is a column named

field _ character

for character equal to v , a and t and for field equal to mean , lower , upper , std , eta , nu and density .

  1. The character v denotes this is the prior information for a value, a the prior information for an age difference, and t the prior information for a time difference.

  2. The density has been mapped to the corresponding density_name .

  3. If the corresponding value_prior_id is null , the const_value prior is displayed.

  4. If is null , or has no affect, it is displayed as empty. Note that the fields eta_v are always displayed for fixed effects because they have a scaling affect.

data.csv

The file dir / data.csv is written by this command. It is a CSV file with one row for each row in the data_subset_table and has the following columns:

data_id

is the data table data_id .

data_extra_columns

Each column specified by the data_extra_columns option is included in the data.csv file.

child

If this data row is associated with a child, this is the name of the child. Otherwise, this data is associated with the Parent Node .

node

is the node_name for this data row. This will correspond directly to the data table node_id .

group

is the group_name corresponding to the subgroup for this data row.

subgroup

is the subgroup_name for this data row. This will correspond directly to the data table subgroup_id .

integrand

is the integrand table integrand_name .

weight

is the weight_name .

age_lo

is the lower age used in the fits; i.e., the data table age_lower modified by the age compression interval in the compress_interval option.

age_up

is the upper age used in the fits; i.e., the data table age_upper modified by the age compression interval.

time_lo

is the lower time used in the fits; i.e., the data table time_lower modified by the time compression interval.

time_up

is the upper time used in the fits; i.e., the data table time_upper modified by the time compression interval.

d_out

is the value of hold_out in the data table.

s_out

is the value of hold_out in the data_subset table.

density

is the density_name for data_subset table density_id for this row.

eta

is the data_subset table eta for this row.

nu

is the data_subset table nu for this row.

ss

is the data_subset table sample_size for this row.

meas_std

is the data table meas_std . Except in the binomial case, where it is an approximation for the standard deviation of the binomial counts divided by the sample size.

meas_stdcv

is the minimum cv standard deviation used to define the likelihood; see Delta . In the binomial case it is equal to meas_std.

meas_sigma

If the previous fit command had a simulate_index , this column is empty. We use sigma to denote the adjusted standard deviation sigma for this row.

The transformed standard deviation delta is computed by dividing by the residual. This results in plus infinity and not valid when the residual is zero. If this calculation for delta is greater than the maximum python float value, meas_sigma is reported as empty . Otherwise the transformation is inverted to get the value of sigma .

meas_value

is the data table meas_value .

avgint

If the fit_command has been run, this is the avg_integrand for this row.

residual

If the fit_command has been run, this is the weighted_residual for this row; see fit_data_subset above.

sim_value

If the simulate_command has been run, this is the data_sim_value for this data_id and simulate_index in the previous fit command. If there is no simulate_index in the previous fit command, the value zero is used for the simulate_index .

Covariates

For each covariate in the covariate_table there is a column with the corresponding covariate_name . For each covariate column and measurement row, the value in the covariate column is covariate value for this measurement minus the reference value for this covariate, i.e., the corresponding covariate difference x_ij in the model for the average integrand.

predict.csv

If the predict_command has was executed, the CSV file dir / predict.csv is written. For each row of the predict_table there is a corresponding row in predict.csv .

avgint_id

is the avgint table avgint_id .

avgint_extra_columns

Each column specified by the avgint_extra_columns option is included in the predict.csv file.

s_index

This identifies the set model variables corresponding to the last predict_command executed. If the source for the predict command was sample , the model variables correspond to the rows on the sample table with the same sample_index equal to s_index . Otherwise, s_index is empty and the model variables correspond to the fit_var or truth_var table depending on the source for the last predict command executed.

avgint

is the average integrand \(A_i(u, \theta)\). The model variables \((u, \theta)\) correspond to the s_index , and measurement subscript \(i\) denotes to the avgint_table information for this row of predict.csv ; i.e., age_lo , age_up , …

age_lo

is the avgint table age_lower .

age_up

is the avgint table age_upper .

time_lo

is the avgint table time_lower .

time_up

is the avgint table time_upper .

integrand

is the avgint table integrand_name .

weight

is the weight_name for this row.

node

is the node_name for this row.

group

is the group_name corresponding to the subgroup for this data row.

subgroup

is the subgroup_name for this data row. This will correspond directly to the avgint table subgroup_id .

Covariates

For each covariate in the covariate_table there is a column with the corresponding covariate_name . For each covariate column and measurement row, the value in the covariate column is covariate value in the avgint_table minus the reference value for this covariate. i.e., the corresponding covariate difference x_ij in the model for the average integrand.

Example

The file db2csv_command.py contains an example and test using this command.