db2csv_command#

View page source

Create Csv Files that Summarize The Database#

Syntax#

As Program#

dismodat.py database db2csv

As Python Function#

dismod_at.db2csv_command ( database )

Convention#

The null value in the database corresponds to an empty string in the csv files.

database#

is the path from the currently directory to the database. This must be a dismod_at and the init_command must have been run on the database.

dir#

We use the notation dir for the directory where database is located.

fit_var, fit_data_subset#

The log_table is used to determine if the previous fit command had a simulate_index . If so, the fit_var_table and fit_data_subset_table corresponds to simulated data. Otherwise, if they exist, the correspond to the measured data.

simulate_index#

If the previous fit command had a simulate_index that value is used for simulate_index below. Otherwise, zero is used for simulate_index below.

option.csv#

The file dir / option.csv is written by this command. It is a CSV file with one row for each possible row in the option_table . The columns in option.csv are option_name and option_value . If a row does not appear in the option table, the corresponding default value is written to option.csv . If the parent_node_id appears in the option table, the parent_node_name row of option.csv is filled in with the corresponding node name.

log.csv#

The file dir / log.csv is written by this command. It is a CSV file with one row for each message in the log_table . The columns in this table are message_type , table_name , row_id , unix_time , and message .

age_avg.csv#

The file dir / age_avg.csv is written by this command. It is a CSV file with the contents of the age_avg table. The only column in this table is age . Note that a set_command may change the value of ode_step_size or age_avg_split but it will not write out the new age_avg table.

hes_fixed.csv#

If the asymptotic sample command was executed, the contents of the hes_fixed_table are written to the CSV file dir / hes_fixed.csv . The columns in this table are row_var_id , col_var_id , hes_fixed_value .

hes_random.csv#

If a fit both , fit random , or sample asymptotic command was executed, the contents of the hes_random_table are written to the CSV file dir / hes_random.csv . The columns in this table are row_var_id , col_var_id , hes_random_value .

trace_fixed.csv#

If the fit fixed or fit both command has completed, the contents of the trace_fixed_table are written to the CSV file dir / trace_fixed.csv . The columns in this table have the same name as in the corresponding table with the exception that the column regularization_size is called reg_size .

mixed_info.csv#

If the fit_command completed the contents of the mixed_info_table are written to the CSV file dir / mixed_info.csv .

variable.csv#

The file dir / variable.csv is written by this command. It is a CSV file with one row for each of the model_variables and has the following columns:

var_id#

is the var_id .

var_type#

is the var_type .

s_id#

is the smooth_id for this variable. If the variable is a smoothing standard deviation multiplier this is the smoothing that this multiplier effects. Otherwise, it is the smoothing where the prior for this variable comes from.

m_id#

If this variable is a covariate multiplier, this is the corresponding mulcov_id .

m_diff#

If this variable is a covariate multiplier, this is the corresponding max_cov_diff .

bound#

If the upper and lower value limits in the value prior for this variable are not equal, this is a bound for the absolute value of this variable; see max_mulcov and bound_random .

age#

is the age .

time#

is the time .

rate#

is the rate_name .

integrand#

is the integrand_name .

covariate#

is the covariate_name .

node#

is the node_name .

group#

This field is non-empty for Group Covariate Multipliers .

subgroup#

This field is non-empty for Subgroup Covariate Multipliers .

fixed#

is true if this variable is a fixed effect , otherwise it is false .

depend#

If the depend_var_table exists, this has one of the following: none if neither the data nor the prior depends on this variable, data if only the data depends on this variable, prior if only the prior depends on this variable, both if both the data and the prior depend on this variable.

fit_value#

If the fit_command has been run, this is the fit_var_value .

start#

is the start_var_value for this variable.

scale#

is the scale_var_value for this variable.

truth#

If the truth_var table exists, this is the truth_var_value for this variable.

sam_avg#

If the sample table exists, for each var_id this is the average with respect to with respect to sample_index of the var_value corresponding to this var_id .

sam_std#

If the sample table exists, for each fixed var_id this is the estimated standard deviation with respect to with respect to sample_index of the # var_value corresponding to this var_id . If there is only one sample_index in the sample table, this column is empty because the standard deviation cannot be estimated from one sample.

res_value#

If the fit_command has been run, this is the residual_value .

res_dage#

If the fit_command has been run, this is the residual_dage ; see fit_var above.

res_dtime#

If the fit_command has been run, this is the residual_dtime ; see fit_var above.

lag_value#

If the fit_command has been run, this is the lagrange_value ; see fit_var above.

lag_dage#

If the fit_command has been run, this is the lagrange_dage ; see fit_var above.

lag_dtime#

If the fit_command has been run, this is the lagrange_dtime ; see fit_var above.

sim_v, sim_a, sim_t#

If the simulate_command has been run, these are the values of prior_sim_value , prior_sim_dage , and prior_sim_dtime , for the simulate_index .

prior_info#

There is a column named

field _ character

for character equal to v , a and t and for field equal to mean , lower , upper , std , eta , nu and density .

  1. The character v denotes this is the prior information for a value, a the prior information for an age difference, and t the prior information for a time difference.

  2. The density has been mapped to the corresponding density_name .

  3. If the corresponding value_prior_id is null , the const_value prior is displayed.

  4. If is null , or has no affect, it is displayed as empty. Note that the fields eta_v are always displayed for fixed effects because they have a scaling affect.

data.csv#

The file dir / data.csv is written by this command. It is a CSV file with one row for each row in the data_subset_table and has the following columns:

data_id#

is the data table data_id .

data_extra_columns#

Each column specified by the data_extra_columns option is included in the data.csv file.

child#

If this data row is associated with a child, this is the name of the child. Otherwise, this data is associated with the Parent Node .

node#

is the node_name for this data row. This will correspond directly to the data table node_id .

group#

is the group_name corresponding to the subgroup for this data row.

subgroup#

is the subgroup_name for this data row. This will correspond directly to the data table subgroup_id .

integrand#

is the integrand table integrand_name .

weight#

is the weight_name .

age_lo#

is the lower age used in the fits; i.e., the data table age_lower modified by the age compression interval in the compress_interval option.

age_up#

is the upper age used in the fits; i.e., the data table age_upper modified by the age compression interval.

time_lo#

is the lower time used in the fits; i.e., the data table time_lower modified by the time compression interval.

time_up#

is the upper time used in the fits; i.e., the data table time_upper modified by the time compression interval.

d_out#

is the value of hold_out in the data table.

s_out#

is the value of hold_out in the data_subset table.

density#

is the density_name for data_subset table density_id for this row.

eta#

is the data_subset table eta for this row.

nu#

is the data_subset table nu for this row.

ss#

is the data_subset table sample_size for this row.

meas_std#

is the data table meas_std . Except in the binomial case, where it is an approximation for the standard deviation of the binomial counts divided by the sample size.

meas_stdcv#

is the minimum cv standard deviation used to define the likelihood; see Delta . In the binomial case it is equal to meas_std.

meas_sigma#

If the previous fit command had a simulate_index , this column is empty. We use sigma to denote the adjusted standard deviation sigma for this row.

The transformed standard deviation delta is computed by dividing by the residual. This results in plus infinity and not valid when the residual is zero. If this calculation for delta is greater than the maximum python float value, meas_sigma is reported as empty . Otherwise the transformation is inverted to get the value of sigma .

meas_value#

is the data table meas_value .

avgint#

If the fit_command has been run, this is the avg_integrand for this row.

residual#

If the fit_command has been run, this is the weighted_residual for this row; see fit_data_subset above.

sim_value#

If the simulate_command has been run, this is the data_sim_value for this data_id and simulate_index in the previous fit command. If there is no simulate_index in the previous fit command, the value zero is used for the simulate_index .

Covariates#

For each covariate in the covariate_table there is a column with the corresponding covariate_name . For each covariate column and measurement row, the value in the covariate column is covariate value for this measurement minus the reference value for this covariate, i.e., the corresponding covariate difference x_ij in the model for the average integrand.

predict.csv#

If the predict_command has was executed, the CSV file dir / predict.csv is written. For each row of the predict_table there is a corresponding row in predict.csv .

avgint_id#

is the avgint table avgint_id .

avgint_extra_columns#

Each column specified by the avgint_extra_columns option is included in the predict.csv file.

s_index#

This identifies the set model variables corresponding to the last predict_command executed. If the source for the predict command was sample , the model variables correspond to the rows on the sample table with the same sample_index equal to s_index . Otherwise, s_index is empty and the model variables correspond to the fit_var or truth_var table depending on the source for the last predict command executed.

avgint#

is the average integrand \(A_i(u, \theta)\). The model variables \((u, \theta)\) correspond to the s_index , and measurement subscript \(i\) denotes to the avgint_table information for this row of predict.csv ; i.e., age_lo , age_up , …

age_lo#

is the avgint table age_lower .

age_up#

is the avgint table age_upper .

time_lo#

is the avgint table time_lower .

time_up#

is the avgint table time_upper .

integrand#

is the avgint table integrand_name .

weight#

is the weight_name for this row.

node#

is the node_name for this row.

group#

is the group_name corresponding to the subgroup for this data row.

subgroup#

is the subgroup_name for this data row. This will correspond directly to the avgint table subgroup_id .

Covariates#

For each covariate in the covariate_table there is a column with the corresponding covariate_name . For each covariate column and measurement row, the value in the covariate column is covariate value in the avgint_table minus the reference value for this covariate. i.e., the corresponding covariate difference x_ij in the model for the average integrand.

Example#

The file db2csv_command.py contains an example and test using this command.

ihme_db.sh#

The script ihme_db.sh can be used to run db2csv for a dismod_at database on the IHME cluster.