sim_mod <- DAG.empty() %>%
add.nodes(
node("name",
t = 0,
distr = "",
...
)
) %>%
add.nodes(
node("name",
t = 1:2,
distr = "",
...
)
)1 Simulation
Simulations are an important part of the pre-data step in statistical analyses. The basic steps are:
- Create simulation modules
- Test simulation modules
- Run simulation studies
- Document simulation modules and simulation studies in quarto project report
Simulation modules are recipes for creating data that is analysed in simulation studies. They therefore represent the data-generating process and contain important assumptions about the causal model. Specifying detailed simulations can therefore also help to think about and communicate your causal model.
Simulation studies help in validating the following domains (Betancourt 2020):
- Domain Expertise Consistency: Is our model consistent with our domain expertise?
- Computational Faithfulness: Will our computational tools be sufficient to accurately fit our model?
- Inferential Adequacy: Will our inference provide enough information to answer our questions?
- Model Adequacy: Is our model rich enough to capture the relevant structure of the true data generating process?
The following packages are used:
- Creating simulation modules:
simcausal(Sofrygin, Laan, and Neugebauer 2017) - Running simulation studies:
simChef(Tang and Duncan 2025)
1.1 Creating simulation modules with simcausal
Simulation modules are created with the following basic code.
DAG.empty()creates an empty DAG- Pipe into
add.nodes()to add a single variable withnode() - Inside
node(), specify a name, a distributiondistr, and the parameters of this distribution - Specify timepoint
t, if you simulate longitudinal data. For background variables, uset = 0. Once, you usedt, all subsequent variables also need it. For cross-sectional data,tcan be omitted. - Repeat piping into
add.nodes()to add a single variable withnode()
Here are some distributions and there parameters:
rconst:constrbern:probrnorm:mean,sdtruncnorm::rtruncnorm:a,b,mean,sdsupfuns::rdiscunif:min,maxrpois:lambda
If the value of a parameter depends on the value of another variable, make sure to add the depending variable after the variable it depends on. Include the variable by its name and add the time in square brackets. Use one or more ifelse() to code the dependencies.
sim_mod <- DAG.empty() %>%
add.nodes(
node("asthma",
t = 1,
distr = "rbern",
prob = p_asthma_1
)
) %>%
add.nodes(
node("asthma",
t = 2:5,
distr = "rbern",
prob = ifelse(asthma[t - 1] == 1,
1 - p_remission,
p_relapse
)
)
)In the code above, note how you can re-use the name ("asthma") if t differs. asthma[1] needs to be defined first, because its prob parameter is different from the t = 2:5 as it does not depend on a previous asthma status. Note also that t can be used when addressing a simulated variable. prob of asthma[2] depends on asthma[1], prob of asthma[3] depends on asthma[2], etc. Also note that in addition to variable asthma, there are parameters p_asthma_1, p_remission, p_relapse. Parameters have a single value that is the same for all simulated observations. However, since we want to change parameter values in simulation studies, we use a variable, i.e., placeholder. In addition to the code, the R script of a simulation module should include a list and description of variables and parameters.
# Title -------------------------------------------------------------------
# Project Title: ...
# Script Title: ...
# Script Description: Creating simulation module "..."
## Variables:
# asthma[1:5] (binary asthma indicator)
## Parameters:
# p_asthma_1 (probability of asthma at time 1)
# p_remission (probability of asthma remission)
# p_relapse (probability of asthma relapse)
# Module ------------------------------------------------------------------
sim_mod_asthma <- DAG.empty() %>%
add.nodes(
node("asthma",
t = 1,
distr = "rbern",
prob = p_asthma_1
)
) %>%
add.nodes(
node("asthma",
t = 2:5,
distr = "rbern",
prob = ifelse(asthma[t - 1] == 1,
1 - p_remission,
p_relapse
)
)
)Multiple simulation modules can be combined with each other and variables from earlier modules can be used in later modules. The complete DAG, i.e., after combining simulation modules, cannot include the same variables multiple times, and therefore, when you create modules that are always combined, don’t include variables from earlier modules in later modules. Also, when using supfuns::run_sim(), make sure that the modules are listed in the correct order in the list passed to argument dags.
1.2 Testing simulation modules
Before conducting simulation studies, test if the module has been implemented correctly. For this, simulate some data based on pre-specified parameter values and evaluate if they can be recovered. The workflow for running simulation studies is also used for testing simulation modules. See an example below:
Example of testing a simulation module
# Title -------------------------------------------------------------------
# Project Title: greenstress
# Script Title: pre_2_2_test_outc_miscl_2
# Script Description: Testing simulation module "Outcome misclassification
# with predictor"
# Prepare experiment ------------------------------------------------------
## Create functions ----
test_2_dgp <- function(n, pX, pAlpha, ORx, se, sp) {
dags <- list(sim_mod_2)
args <- list(
pX = pX,
pAlpha = pAlpha,
ORx = ORx,
se = se,
sp = sp
)
df <- run_sim(
dags = dags,
args = args,
n = n
)
out <- list(
df = df,
se = se,
sp = sp
)
return(out)
}
test_2_method <- function(df, se, sp) {
out <- list(
p_x = mean(df$x),
p_yTrue_x0 = mean(df$yTrue[df$x == 0]),
or_true = exp(qlogis(mean(df$yTrue[df$x == 1])) - qlogis(mean(df$yTrue[df$x == 0]))),
se_true = mean(df$yObs[df$yTrue == 1] == 1),
sp_true = mean(df$yObs[df$yTrue == 0] == 0)
)
return(out)
}
## Create experiment ----
dgp2 <- create_dgp(
.dgp_fun = test_2_dgp,
.name = "Test Module 2: DGP",
n = 1000
)
meth2 <- create_method(
.method_fun = test_2_method,
.name = "Test Module 2: Method"
)
test_2_exp <- create_experiment(
name = "Test Module 2"
) %>%
add_dgp(dgp2) %>%
add_method(meth2) %>%
add_vary_across(
.dgp = dgp2,
pX = list(0.5),
pAlpha = list(0.005, 0.05, 0.5),
ORx = list(1, 1.2, 2),
se = list(0.6),
sp = list(0.8)
)
## Clean
rm(test_2_dgp, test_2_method, dgp2, meth2)
# Run experiment ----------------------------------------------------------
if (cache$exists("test_2_result")) {
test_2_result <- cache$get("test_2_result")
} else {
test_2_result <- test_2_exp %>%
run_experiment(n_reps = 100, save = FALSE)
cache$set("test_2_result", test_2_result)
}
rm(test_2_exp)
# Visualise ---------------------------------------------------------------
df_s_no <- test_2_result$fit_results %>%
filter(pAlpha == 0.005, ORx == 1)
df_s_low <- test_2_result$fit_results %>%
filter(pAlpha == 0.005, ORx == 1.2)
df_s_high <- test_2_result$fit_results %>%
filter(pAlpha == 0.005, ORx == 2)
df_m_no <- test_2_result$fit_results %>%
filter(pAlpha == 0.05, ORx == 1)
df_m_low <- test_2_result$fit_results %>%
filter(pAlpha == 0.05, ORx == 1.2)
df_m_high <- test_2_result$fit_results %>%
filter(pAlpha == 0.05, ORx == 2)
df_l_no <- test_2_result$fit_results %>%
filter(pAlpha == 0.5, ORx == 1)
df_l_low <- test_2_result$fit_results %>%
filter(pAlpha == 0.5, ORx == 1.2)
df_l_high <- test_2_result$fit_results %>%
filter(pAlpha == 0.5, ORx == 2)
## 0.005, 1 ----
# 0.005, 1, pX
p_pX_s_no <- df_s_no %>%
ggplot(aes(x = p_x)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.5)) +
theme_bw() +
labs(x = "p(X)", y = "Density", title = "OR = 1") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pX_s_no)$y$range$range[2]
p_pX_s_no <- p_pX_s_no +
annotate("label",
x = 0.5,
y = y_max / 2,
label = "pX = 0.5"
)
# 0.005, 1, pAlpha
p_pAlpha_s_no <- df_s_no %>%
ggplot(aes(x = p_yTrue_x0)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.005)) +
theme_bw() +
labs(x = "p(yTrue, X=0)", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pAlpha_s_no)$y$range$range[2]
p_pAlpha_s_no <- p_pAlpha_s_no +
annotate("label",
x = 0.005,
y = y_max / 2,
label = "pAlpha = 0.005",
hjust = "left"
)
# 0.005, 1, ORx
p_ORx_s_no <- df_s_no %>%
ggplot(aes(x = or_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 1)) +
theme_bw() +
labs(x = "OR", y = "Density") +
xlim(0, 4)
y_max <- ggplot2::layer_scales(p_ORx_s_no)$y$range$range[2]
p_ORx_s_no <- p_ORx_s_no +
annotate("label",
x = 1,
y = y_max / 2,
label = "ORx = 1"
)
# 0.005, 1, se
p_se_s_no <- df_s_no %>%
ggplot(aes(x = se_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.6)) +
theme_bw() +
labs(x = "Sensitivity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_se_s_no)$y$range$range[2]
p_se_s_no <- p_se_s_no +
annotate("label",
x = 0.6,
y = y_max / 2,
label = "se = 0.6"
)
# 0.005, 1, sp
p_sp_s_no <- df_s_no %>%
ggplot(aes(x = sp_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.8)) +
theme_bw() +
labs(x = "Specificity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_sp_s_no)$y$range$range[2]
p_sp_s_no <- p_sp_s_no +
annotate("label",
x = 0.8,
y = y_max / 2,
label = "sp = 0.8",
hjust = "right"
)
## 0.005, 1.2 ----
# 0.005, 1.2, pX
p_pX_s_low <- df_s_low %>%
ggplot(aes(x = p_x)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.5)) +
theme_bw() +
labs(x = "p(X)", y = "Density", title = "OR = 1.2") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pX_s_low)$y$range$range[2]
p_pX_s_low <- p_pX_s_low +
annotate("label",
x = 0.5,
y = y_max / 2,
label = "pX = 0.5"
)
# 0.005, 1.2, pAlpha
p_pAlpha_s_low <- df_s_low %>%
ggplot(aes(x = p_yTrue_x0)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.005)) +
theme_bw() +
labs(x = "p(yTrue, X=0)", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pAlpha_s_low)$y$range$range[2]
p_pAlpha_s_low <- p_pAlpha_s_low +
annotate("label",
x = 0.005,
y = y_max / 2,
label = "pAlpha = 0.005",
hjust = "left"
)
# 0.005, 1.2, ORx
p_ORx_s_low <- df_s_low %>%
ggplot(aes(x = or_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 1.2)) +
theme_bw() +
labs(x = "OR", y = "Density") +
xlim(0, 4)
y_max <- ggplot2::layer_scales(p_ORx_s_low)$y$range$range[2]
p_ORx_s_low <- p_ORx_s_low +
annotate("label",
x = 1.2,
y = y_max / 2,
label = "ORx = 1.2"
)
# 0.005, 1.2, se
p_se_s_low <- df_s_low %>%
ggplot(aes(x = se_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.6)) +
theme_bw() +
labs(x = "Sensitivity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_se_s_low)$y$range$range[2]
p_se_s_low <- p_se_s_low +
annotate("label",
x = 0.6,
y = y_max / 2,
label = "se = 0.6"
)
# 0.005, 1.2, sp
p_sp_s_low <- df_s_low %>%
ggplot(aes(x = sp_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.8)) +
theme_bw() +
labs(x = "Specificity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_sp_s_low)$y$range$range[2]
p_sp_s_low <- p_sp_s_low +
annotate("label",
x = 0.8,
y = y_max / 2,
label = "sp = 0.8",
hjust = "right"
)
## 0.005, 2 ----
# 0.005, 2, pX
p_pX_s_high <- df_s_high %>%
ggplot(aes(x = p_x)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.5)) +
theme_bw() +
labs(x = "p(X)", y = "Density", title = "OR = 2") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pX_s_high)$y$range$range[2]
p_pX_s_high <- p_pX_s_high +
annotate("label",
x = 0.5,
y = y_max / 2,
label = "pX = 0.5"
)
# 0.005, 2, pAlpha
p_pAlpha_s_high <- df_s_high %>%
ggplot(aes(x = p_yTrue_x0)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.005)) +
theme_bw() +
labs(x = "p(yTrue, X=0)", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pAlpha_s_high)$y$range$range[2]
p_pAlpha_s_high <- p_pAlpha_s_high +
annotate("label",
x = 0.005,
y = y_max / 2,
label = "pAlpha = 0.005",
hjust = "left"
)
# 0.005, 2, ORx
p_ORx_s_high <- df_s_high %>%
ggplot(aes(x = or_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 2)) +
theme_bw() +
labs(x = "OR", y = "Density") +
xlim(0, 4)
y_max <- ggplot2::layer_scales(p_ORx_s_high)$y$range$range[2]
p_ORx_s_high <- p_ORx_s_high +
annotate("label",
x = 2,
y = y_max / 2,
label = "ORx = 2"
)
# 0.005, 2, se
p_se_s_high <- df_s_high %>%
ggplot(aes(x = se_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.6)) +
theme_bw() +
labs(x = "Sensitivity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_se_s_high)$y$range$range[2]
p_se_s_high <- p_se_s_high +
annotate("label",
x = 0.6,
y = y_max / 2,
label = "se = 0.6"
)
# 0.005, 2, sp
p_sp_s_high <- df_s_high %>%
ggplot(aes(x = sp_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.8)) +
theme_bw() +
labs(x = "Specificity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_sp_s_high)$y$range$range[2]
p_sp_s_high <- p_sp_s_high +
annotate("label",
x = 0.8,
y = y_max / 2,
label = "sp = 0.8",
hjust = "right"
)
p_s <- wrap_plots(
p_pX_s_no, p_pX_s_low, p_pX_s_high,
p_pAlpha_s_no, p_pAlpha_s_low, p_pAlpha_s_high,
p_ORx_s_no, p_ORx_s_low, p_ORx_s_high,
p_se_s_no, p_se_s_low, p_se_s_high,
p_sp_s_no, p_sp_s_low, p_sp_s_high,
nrow = 5, ncol = 3,
axes = "collect"
) +
plot_annotation(tag_levels = "A")
rm(
p_pX_s_no, p_pX_s_low, p_pX_s_high,
p_pAlpha_s_no, p_pAlpha_s_low, p_pAlpha_s_high,
p_ORx_s_no, p_ORx_s_low, p_ORx_s_high,
p_se_s_no, p_se_s_low, p_se_s_high,
p_sp_s_no, p_sp_s_low, p_sp_s_high
)
## 0.05, 1 ----
# 0.05, 1, pX
p_pX_m_no <- df_m_no %>%
ggplot(aes(x = p_x)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.5)) +
theme_bw() +
labs(x = "p(X)", y = "Density", title = "OR = 1") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pX_m_no)$y$range$range[2]
p_pX_m_no <- p_pX_m_no +
annotate("label",
x = 0.5,
y = y_max / 2,
label = "pX = 0.5"
)
# 0.05, 1, pAlpha
p_pAlpha_m_no <- df_m_no %>%
ggplot(aes(x = p_yTrue_x0)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.05)) +
theme_bw() +
labs(x = "p(yTrue, X=0)", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pAlpha_m_no)$y$range$range[2]
p_pAlpha_m_no <- p_pAlpha_m_no +
annotate("label",
x = 0.05,
y = y_max / 2,
label = "pAlpha = 0.05",
hjust = "left"
)
# 0.05, 1, ORx
p_ORx_m_no <- df_m_no %>%
ggplot(aes(x = or_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 1)) +
theme_bw() +
labs(x = "OR", y = "Density") +
xlim(0, 4)
y_max <- ggplot2::layer_scales(p_ORx_m_no)$y$range$range[2]
p_ORx_m_no <- p_ORx_m_no +
annotate("label",
x = 1,
y = y_max / 2,
label = "ORx = 1"
)
# 0.05, 1, se
p_se_m_no <- df_m_no %>%
ggplot(aes(x = se_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.6)) +
theme_bw() +
labs(x = "Sensitivity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_se_m_no)$y$range$range[2]
p_se_m_no <- p_se_m_no +
annotate("label",
x = 0.6,
y = y_max / 2,
label = "se = 0.6"
)
# 0.05, 1, sp
p_sp_m_no <- df_m_no %>%
ggplot(aes(x = sp_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.8)) +
theme_bw() +
labs(x = "Specificity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_sp_m_no)$y$range$range[2]
p_sp_m_no <- p_sp_m_no +
annotate("label",
x = 0.8,
y = y_max / 2,
label = "sp = 0.8",
hjust = "right"
)
## 0.05, 1.2 ----
# 0.05, 1.2, pX
p_pX_m_low <- df_m_low %>%
ggplot(aes(x = p_x)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.5)) +
theme_bw() +
labs(x = "p(X)", y = "Density", title = "OR = 1.2") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pX_m_low)$y$range$range[2]
p_pX_m_low <- p_pX_m_low +
annotate("label",
x = 0.5,
y = y_max / 2,
label = "pX = 0.5"
)
# 0.05, 1.2, pAlpha
p_pAlpha_m_low <- df_m_low %>%
ggplot(aes(x = p_yTrue_x0)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.05)) +
theme_bw() +
labs(x = "p(yTrue, X=0)", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pAlpha_m_low)$y$range$range[2]
p_pAlpha_m_low <- p_pAlpha_m_low +
annotate("label",
x = 0.05,
y = y_max / 2,
label = "pAlpha = 0.05",
hjust = "left"
)
# 0.05, 1.2, ORx
p_ORx_m_low <- df_m_low %>%
ggplot(aes(x = or_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 1.2)) +
theme_bw() +
labs(x = "OR", y = "Density") +
xlim(0, 4)
y_max <- ggplot2::layer_scales(p_ORx_m_low)$y$range$range[2]
p_ORx_m_low <- p_ORx_m_low +
annotate("label",
x = 1.2,
y = y_max / 2,
label = "ORx = 1.2"
)
# 0.05, 1.2, se
p_se_m_low <- df_m_low %>%
ggplot(aes(x = se_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.6)) +
theme_bw() +
labs(x = "Sensitivity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_se_m_low)$y$range$range[2]
p_se_m_low <- p_se_m_low +
annotate("label",
x = 0.6,
y = y_max / 2,
label = "se = 0.6"
)
# 0.05, 1.2, sp
p_sp_m_low <- df_m_low %>%
ggplot(aes(x = sp_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.8)) +
theme_bw() +
labs(x = "Specificity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_sp_m_low)$y$range$range[2]
p_sp_m_low <- p_sp_m_low +
annotate("label",
x = 0.8,
y = y_max / 2,
label = "sp = 0.8",
hjust = "right"
)
## 0.05, 2 ----
# 0.05, 2, pX
p_pX_m_high <- df_m_high %>%
ggplot(aes(x = p_x)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.5)) +
theme_bw() +
labs(x = "p(X)", y = "Density", title = "OR = 2") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pX_m_high)$y$range$range[2]
p_pX_m_high <- p_pX_m_high +
annotate("label",
x = 0.5,
y = y_max / 2,
label = "pX = 0.5"
)
# 0.05, 2, pAlpha
p_pAlpha_m_high <- df_m_high %>%
ggplot(aes(x = p_yTrue_x0)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.05)) +
theme_bw() +
labs(x = "p(yTrue, X=0)", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pAlpha_m_high)$y$range$range[2]
p_pAlpha_m_high <- p_pAlpha_m_high +
annotate("label",
x = 0.05,
y = y_max / 2,
label = "pAlpha = 0.05",
hjust = "left"
)
# 0.05, 2, ORx
p_ORx_m_high <- df_m_high %>%
ggplot(aes(x = or_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 2)) +
theme_bw() +
labs(x = "OR", y = "Density") +
xlim(0, 4)
y_max <- ggplot2::layer_scales(p_ORx_m_high)$y$range$range[2]
p_ORx_m_high <- p_ORx_m_high +
annotate("label",
x = 2,
y = y_max / 2,
label = "ORx = 2"
)
# 0.05, 2, se
p_se_m_high <- df_m_high %>%
ggplot(aes(x = se_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.6)) +
theme_bw() +
labs(x = "Sensitivity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_se_m_high)$y$range$range[2]
p_se_m_high <- p_se_m_high +
annotate("label",
x = 0.6,
y = y_max / 2,
label = "se = 0.6"
)
# 0.05, 2, sp
p_sp_m_high <- df_m_high %>%
ggplot(aes(x = sp_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.8)) +
theme_bw() +
labs(x = "Specificity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_sp_m_high)$y$range$range[2]
p_sp_m_high <- p_sp_m_high +
annotate("label",
x = 0.8,
y = y_max / 2,
label = "sp = 0.8",
hjust = "right"
)
p_m <- wrap_plots(
p_pX_m_no, p_pX_m_low, p_pX_m_high,
p_pAlpha_m_no, p_pAlpha_m_low, p_pAlpha_m_high,
p_ORx_m_no, p_ORx_m_low, p_ORx_m_high,
p_se_m_no, p_se_m_low, p_se_m_high,
p_sp_m_no, p_sp_m_low, p_sp_m_high,
nrow = 5, ncol = 3,
axes = "collect"
) +
plot_annotation(tag_levels = "A")
rm(
p_pX_m_no, p_pX_m_low, p_pX_m_high,
p_pAlpha_m_no, p_pAlpha_m_low, p_pAlpha_m_high,
p_ORx_m_no, p_ORx_m_low, p_ORx_m_high,
p_se_m_no, p_se_m_low, p_se_m_high,
p_sp_m_no, p_sp_m_low, p_sp_m_high
)
## 0.5, 1 ----
# 0.5, 1, pX
p_pX_l_no <- df_l_no %>%
ggplot(aes(x = p_x)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.5)) +
theme_bw() +
labs(x = "p(X)", y = "Density", title = "OR = 1") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pX_l_no)$y$range$range[2]
p_pX_l_no <- p_pX_l_no +
annotate("label",
x = 0.5,
y = y_max / 2,
label = "pX = 0.5"
)
# 0.5, 1, pAlpha
p_pAlpha_l_no <- df_l_no %>%
ggplot(aes(x = p_yTrue_x0)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.5)) +
theme_bw() +
labs(x = "p(yTrue, X=0)", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pAlpha_l_no)$y$range$range[2]
p_pAlpha_l_no <- p_pAlpha_l_no +
annotate("label",
x = 0.5,
y = y_max / 2,
label = "pAlpha = 0.5"
)
# 0.5, 1, ORx
p_ORx_l_no <- df_l_no %>%
ggplot(aes(x = or_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 1)) +
theme_bw() +
labs(x = "OR", y = "Density") +
xlim(0, 4)
y_max <- ggplot2::layer_scales(p_ORx_l_no)$y$range$range[2]
p_ORx_l_no <- p_ORx_l_no +
annotate("label",
x = 1,
y = y_max / 2,
label = "ORx = 1"
)
# 0.5, 1, se
p_se_l_no <- df_l_no %>%
ggplot(aes(x = se_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.6)) +
theme_bw() +
labs(x = "Sensitivity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_se_l_no)$y$range$range[2]
p_se_l_no <- p_se_l_no +
annotate("label",
x = 0.6,
y = y_max / 2,
label = "se = 0.6"
)
# 0.5, 1, sp
p_sp_l_no <- df_l_no %>%
ggplot(aes(x = sp_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.8)) +
theme_bw() +
labs(x = "Specificity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_sp_l_no)$y$range$range[2]
p_sp_l_no <- p_sp_l_no +
annotate("label",
x = 0.8,
y = y_max / 2,
label = "sp = 0.8",
hjust = "right"
)
## 0.5, 1.2 ----
# 0.5, 1.2, pX
p_pX_l_low <- df_l_low %>%
ggplot(aes(x = p_x)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.5)) +
theme_bw() +
labs(x = "p(X)", y = "Density", title = "OR = 1.2") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pX_l_low)$y$range$range[2]
p_pX_l_low <- p_pX_l_low +
annotate("label",
x = 0.5,
y = y_max / 2,
label = "pX = 0.5"
)
# 0.5, 1.2, pAlpha
p_pAlpha_l_low <- df_l_low %>%
ggplot(aes(x = p_yTrue_x0)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.5)) +
theme_bw() +
labs(x = "p(yTrue, X=0)", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pAlpha_l_low)$y$range$range[2]
p_pAlpha_l_low <- p_pAlpha_l_low +
annotate("label",
x = 0.5,
y = y_max / 2,
label = "pAlpha = 0.5"
)
# 0.5, 1.2, ORx
p_ORx_l_low <- df_l_low %>%
ggplot(aes(x = or_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 1.2)) +
theme_bw() +
labs(x = "OR", y = "Density") +
xlim(0, 4)
y_max <- ggplot2::layer_scales(p_ORx_l_low)$y$range$range[2]
p_ORx_l_low <- p_ORx_l_low +
annotate("label",
x = 1.2,
y = y_max / 2,
label = "ORx = 1.2"
)
# 0.5, 1.2, se
p_se_l_low <- df_l_low %>%
ggplot(aes(x = se_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.6)) +
theme_bw() +
labs(x = "Sensitivity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_se_l_low)$y$range$range[2]
p_se_l_low <- p_se_l_low +
annotate("label",
x = 0.6,
y = y_max / 2,
label = "se = 0.6"
)
# 0.5, 1.2, sp
p_sp_l_low <- df_l_low %>%
ggplot(aes(x = sp_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.8)) +
theme_bw() +
labs(x = "Specificity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_sp_l_low)$y$range$range[2]
p_sp_l_low <- p_sp_l_low +
annotate("label",
x = 0.8,
y = y_max / 2,
label = "sp = 0.8",
hjust = "right"
)
## 0.5, 2 ----
# 0.5, 2, pX
p_pX_l_high <- df_l_high %>%
ggplot(aes(x = p_x)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.5)) +
theme_bw() +
labs(x = "p(X)", y = "Density", title = "OR = 2") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pX_l_high)$y$range$range[2]
p_pX_l_high <- p_pX_l_high +
annotate("label",
x = 0.5,
y = y_max / 2,
label = "pX = 0.5"
)
# 0.5, 2, pAlpha
p_pAlpha_l_high <- df_l_high %>%
ggplot(aes(x = p_yTrue_x0)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.5)) +
theme_bw() +
labs(x = "p(yTrue, X=0)", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_pAlpha_l_high)$y$range$range[2]
p_pAlpha_l_high <- p_pAlpha_l_high +
annotate("label",
x = 0.5,
y = y_max / 2,
label = "pAlpha = 0.5"
)
# 0.5, 2, ORx
p_ORx_l_high <- df_l_high %>%
ggplot(aes(x = or_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 2)) +
theme_bw() +
labs(x = "OR", y = "Density") +
xlim(0, 4)
y_max <- ggplot2::layer_scales(p_ORx_l_high)$y$range$range[2]
p_ORx_l_high <- p_ORx_l_high +
annotate("label",
x = 2,
y = y_max / 2,
label = "ORx = 2"
)
# 0.5, 2, se
p_se_l_high <- df_l_high %>%
ggplot(aes(x = se_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.6)) +
theme_bw() +
labs(x = "Sensitivity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_se_l_high)$y$range$range[2]
p_se_l_high <- p_se_l_high +
annotate("label",
x = 0.6,
y = y_max / 2,
label = "se = 0.6"
)
# 0.5, 2, sp
p_sp_l_high <- df_l_high %>%
ggplot(aes(x = sp_true)) +
geom_density(fill = "lightblue") +
geom_vline(aes(xintercept = 0.8)) +
theme_bw() +
labs(x = "Specificity", y = "Density") +
xlim(0, 1)
y_max <- ggplot2::layer_scales(p_sp_l_high)$y$range$range[2]
p_sp_l_high <- p_sp_l_high +
annotate("label",
x = 0.8,
y = y_max / 2,
label = "sp = 0.8",
hjust = "right"
)
p_l <- wrap_plots(
p_pX_l_no, p_pX_l_low, p_pX_l_high,
p_pAlpha_l_no, p_pAlpha_l_low, p_pAlpha_l_high,
p_ORx_l_no, p_ORx_l_low, p_ORx_l_high,
p_se_l_no, p_se_l_low, p_se_l_high,
p_sp_l_no, p_sp_l_low, p_sp_l_high,
nrow = 5, ncol = 3,
axes = "collect"
) +
plot_annotation(tag_levels = "A")
rm(
p_pX_l_no, p_pX_l_low, p_pX_l_high,
p_pAlpha_l_no, p_pAlpha_l_low, p_pAlpha_l_high,
p_ORx_l_no, p_ORx_l_low, p_ORx_l_high,
p_se_l_no, p_se_l_low, p_se_l_high,
p_sp_l_no, p_sp_l_low, p_sp_l_high
)
rm(
df_l_high, df_l_low, df_l_no, df_m_high, df_m_low, df_m_no, df_s_high,
df_s_low, df_s_no, y_max, test_2_result
)1.3 Running simulation studies with simChef
Simulation studies need the following steps:
- Prepare experiment
- Create functions
- Create experiment
- Run experiment
- Visualise
All steps are written in a single R script, which is included in the quarto report using #| file:.... See an example in Section 5.2.2.5.
1.3.1 Prepare experiment
1.3.1.1 Create functions
First, functions used in the simulation experiments need to be created. simChef knows the following types:
- Data-generating process (DGP): Provide exactly one DGP function
- Method: Describes analysis methods. Provide one or more.
- Evaluator: Post-processing. Not necessary but a single evaluator can be provided.
- Visualiser: Not used. Create plots after completing the simulation.
1.3.1.1.1 DGP
Arguments of the DGP function are the simulation modules’ parameters as well as n, which describes the number of simulated observations.
If any parameters are drawn from a distribution, start the function with drawing a single value.
Next, create the list of simulation module objects that is passed to argument dags of supfuns::run_sim(). Make sure to list them in the correct order.
Next, create the list of parameters that is passed to argument args of supfuns::run_sim() from the arguments of the DGP function and the values drawn from distributions.
Next, run function supfuns::run_sim().
Finally, return values. Return at least the data.frame created by supfuns::run_sim() and add parameter values in case they are needed by method or evaluator functions.
sim_dgp <- function(n, par_a, par_b) {
par_c <- rbeta(1, 5, 10)
dags <- list(sim_mod_1, sim_mod_2)
args <- list(
par_a = par_a,
par_b = par_b,
par_c = par_c
)
df <- run_sim(
dags = dags,
args = args,
n = n
)
out <- list(
df = df,
par_a = par_a
)
return(out)
}1.3.1.1.2 Methods
Method functions have exactly the arguments that are returned by the DGP function, i.e., at least df and maybe additional simulation module parameters. Note that all method functions need to have all returned values as arguments, i.e., even if an additional parameter is only needed by one of the method functions, this additional parameter nonetheless needs to be an argument of all method functions.
Method functions analyse the simulated data.frame and return analysis results. You can either conduct summaries here or move them to the evaluator function. Always name the elements in the list returned by the function.
sim_method <- function(df, par_a) {
m <- glm(y ~ x, data = df)
out <- list(
model = m,
nobs = nobs(m)
)
return(out)
}1.3.1.1.3 Evaluator
A function that usually has the single argument fit_results and also returns the object fit_results. The function processes the method functions’ results.
sim_eval <- function(fit_results) {
fit_results %<>%
...
return(fit_results)
}1.3.1.2 Create experiment
Next, based on the functions, simulation study elements must be created using simChef::create_dgp(), simChef::create_method(), and simChef::create_evaluator().
dgp <- create_dgp(
.dgp_fun = sim_dgp,
.name = "dgp"
)
meth <- create_method(
.method_fun = sim_method,
.name = "regression"
)
ev <- create_evaluator(
.eval_fun = sim_eval,
.name = "eval"
)Next, one or more experiments need to be created using simChef::create_experiment(). Provide a name to the experiment, as well as future.packages and future.globals, if you want to use parallelisation. future.packages contains packages used in DGP, method, and evaluator functions. It contains at least supfuns as run_sim() is always used. future.global contains the simulation module objects.
Then, add DGP, method, and evaluator simulation study elements to the experiment using simChef::add_dgp(), simChef::add_method(), and simChef::add_evaluator().
Finally, use simChef::add_vary_across() to specify parameter values. Specify your DGP simulation study element for argument .dgp and include all arguments of the DGP function as arguments of add_vary_across(). Pass a list to each argument that contains one or more parameter values. All possible combinations of provided values are simulated. Then remove functions and simulation study elements.
sim_exp <- create_experiment(
name = "My simulation study",
future.packages = "supfuns",
future.globals = c("sim_mod_1", "sim_mod_2")
) %>%
add_dgp(dgp) %>%
add_method(meth) %>%
add_evaluator(ev) %>%
add_vary_across(
.dgp = dgp,
n = list(1000),
par_a = list(0.5),
par_b = list(0.3, 0.4, 0.8)
)
rm(sim_dgp, sim_method, sim_eval, dgp, meth, ev)1.3.2 Run experiment
After the experiment has been created, use simChef::run_experiment() to run the simulation study. Use caching as described below.
if (cache$exists("sim_result")) {
sim_result <- cache$get("sim_result")
} else {
sim_result <- sim_exp %>%
run_experiment(n_reps = 100, save = FALSE)
cache$set("sim_result", sim_result)
}
rm(sim_exp)1.3.3 Visualise
Finally, create plots from the simulated data.
1.4 Documenting simulation modules in quarto
Simulation modules are documented in the “Pre-data” chapter of a project report. Sub-chapter “Causal model” contains one sub-chapter for each simulation module. See the template below and examples in Section 5.2.1 and Section 5.2.2.
1.4.1 Module <number>: <name>
<Short module description>
1.4.1.1 Assumptions
| Assumption | Justification |
|---|---|
| <Assumption 1> | <Justification 1> |
| … | … |
1.4.1.2 Variables
| Variable | Description |
|---|---|
| <Variable 1 using \(Inline Math\)> | <Variable description 1> |
| … | … |
1.4.1.3 Parameters
| Parameter | Description |
|---|---|
| <Parameter 1 using \(Inline Math\)> | <Parameter description 1> |
| … | … |
1.4.1.4 Technical details
1.4.1.4.1 Equation block
- The complete equation is inside double
$$, i.e., double$$as both the first and the last line - Use align to align all equations at
=or~- Second line, i.e., directly after
$$, is\begin{align} - Second to last line, i.e., directly before
$$, is\end{align} - Add
&in front of the symbols to align, i.e., use&=for \(=\) and&\simfor \(\sim\)
- Second line, i.e., directly after
- Use
\text{some text}to writesome textin more of a text format than an equation format - Use
_for subscript and include the subcript in curly brackets if it is longer than 1 character, e.g.:Y_{True}orp_X - End your equations with
\\to start a new equation in a new line - Cases
- Start with
\begin{cases}and end with\end{cases} - To specify, in which cases a equation is applied, end all equations in the cases block with
, &\quad\text{for} x = 1 &in front of\quadis used for aligning them
- Start with
1.4.1.4.2 Graphs
- Type
/dotuntilGraphViz Code Chunkis suggested, then clicktab - YAML options in the codeblock start with
//|//| label: fig-graph-module<number>//| fig-cap: 'Graph of simulation module "<name>" (Ellipse: Variable, Diamond: Parameter, Box: Variable from other module)'//| fig-height: 2; fig-height is often needed to delete too much white space above and below the graph. Large graphs do not need it. Sometimes, other values give better results, e.g., 3.
- Start with
digraphkeyword, graph name, e.g.,sim_mod_1, and curly brackets:digraph sim_mod_number {}; All code goes inside the curly brackets. - Options:
rankdir = LRbgcolor = "transparent"
- Variable specification: Start with the name and add options in brackets
node_name []label =; Put label inside< >; put subscript inside<SUB> </SUB>shape =; no shape for variables (ellipses),boxfor variables from other modules,diamondfor parameters
- Edges:
node_name1 -> node_name2
digraph m_a {
rankdir = LR
bgcolor = "transparent"
from_other_module [label = <From other<SUB>module</SUB>>, shape = box]
var [label = <Var<SUB>1</SUB>>]
par [label = <A parameter>, shape = diamond]
from_other_module -> var
par -> var
}1.4.1.4.3 Code
- Create an R code chunk
- Add three YAML options:
code-fold: truecode-summary: "Create module '<name>' in simcausal"file: R/<file name>.R
1.5 Documenting simulation studies in quarto
Simulation studies are documented in the “Pre-data” chapter of a project report. Sub-chapter “Simulation” contains one sub-chapter for each simulation module. In addition, the first sub-chapter of “Simulation” is called “Testing” and contains the evaluation, if the simulation modules have been implemented correctly (see Section 1.2). See the template below.
1.5.1 Simulation <number>: <name>
<Short description of simulation study and included simulation modules>
| Parameter | Input values |
|---|---|
| <Parameter 1 using \(Inline Math\)> | <Parameter 1 input value> |
| <Parameter 2 with different scenarios using \(Inline Math\)> | Scenario A: <Parameter 2 input value for scenario A> Scenario B: <Parameter 2 input value for scenario B> |
| … | … |
<Description of simuation procedure, i.e. data-generating processes, including number of replications, number of simulated observation, descriptions and justifications for input values and different input value scenarios>
<Description of analysis methods>
<Include code chunk with the simulation script
code-fold: truecode-summary: "Running simulation <number>"file: R/pre_3_<number>_sim<number>.R>
<Description of simulation results>
<End of template>
1.5.1.1 How to include Stan models?
- Create an R code block
- Use options:
#| class-output: stan#| echo: false
- Input code
- For a stan file use:
cat(readLines("Stan/file_name.stan"), sep = "\n") - For a few lines use:
cat(c("Line 1", "Line 2", "..."), sep = "\n") - Example for b.:
cat(c("// Priors", "a ~ normal(-3.4, 0.6);", "b ~ normal(0, 0.7);"), sep = "\n")
- For a stan file use: