Input for SCC model creation
Sufficient-component cause (SCC) models consist of component causes, which are grouped to sufficient causes. Sufficient causes are minimally sufficient for the outcome of interest to occur, i.e., if only one of the component causes included is missing, the outcome does not occur (at least not via this sufficient cause).
The main modeling task when creating SCC models is therefore to specify, which component causes belong into the same sufficient cause. Since SCC models are causal models, these specifications need to be based on knowledge regarding the mechanisms of outcome occurrence and not statistical knowledge, i.e., based on the mechanisms of outcome occurrence it needs to be specified how component causes are connected with each other and with the outcome.
Because sufficient causes are minimally sufficient, it is necessary
to also describe minimally sufficient mechanisms that connect them. For
this reason, the complex net of mechanisms leading to outcome occurrence
needs to be split into small parts. These small parts are called
steps in epicmodel
. Component causes are
linked with each other and the outcome of interest by
chaining these steps together.
Step structure
Steps have a pre-defined structure to facilitate
chaining. The developed structure tries to strike the balance between
flexible and user-friendly step specification and automated SCC model
creation. The structure used in epicmodel
is basically an
IF-THEN structure. Steps consists of 3 parts:
- IF condition
- IFNOT condition
- THEN statement
THEN statements
THEN statements are the core building block of steps. They describe what happens, e.g., Cell A releases cytokine B, Exposure to C, D moves to E, etc. In order to facilitate chaining of steps, THEN statements also follow a pre-defined structure. This makes it possible to automatically create ID and description for THEN statements. IDs are then used for chaining. As steps are built from THEN statements, this structure also enables automated creation of step IDs and descriptions.
THEN statements contain up to 4 segments:
- WHAT segment (Subject)
- DOES segment
- WHAT segment or THEN statement (Object)
- WHERE segment
THEN statements follow a WHAT DOES WHAT WHERE
structure
and therefore generally consist of WHAT, DOES, and WHERE
segments. WHAT segments can appear twice: the first is called
“subject”, the second is called
“object”. Here you can see how the examples above fit
into the WHAT DOES WHAT WHERE
structure:
Cell A releases cytokine B
- WHAT: Cell A
- DOES: releases
- WHAT: cytokine B
- WHERE: -
Exposure to C
- WHAT: -
- DOES: Exposure to
- WHAT: C
- WHERE: -
D moves to E
- WHAT: D
- DOES: moves
- WHAT: -
- WHERE: to E
Some DOES segments require the object to be a THEN
statement instead of a WHAT segment. Imagine, e.g., the DOES
segment “inhibition”. In general, inhibition can be modeled by
specifying IFNOT conditions (see below). However, if inhibition only
occurs under certain condition, it might need to be specified as its own
step. Since only a certain process, i.e., a specific step is inhibited,
the object in these cases needs to be this exact step. There might be
other DOES segments, for which THEN objects are necessary. The option to
use THEN objects offers more flexibility to model the mechanisms found
in Nature within the WHAT DOES WHAT WHERE
structure. Note
that a certain DOES segment either has WHAT or THEN objects in all its
steps. Please also note that THEN objects can be “stacked” by including
as object a THEN statement that already has a THEN object. In general,
WHAT object should be far more prevalent though. (That’s why we
generally refer to the structure as WHAT DOES WHAT WHERE
structure.)
A THEN statement can contain up to 4 segments, but it does
not have to. In general, all combinations of segments are
possible and you, the modeler, need to decide how to model the process
of interest. The WHAT DOES WHAT WHERE
structure is supposed
to facilitate automation of naming and SCC creation, but also grant as
much flexibility as needed to model all necessary processes. Remember
that the goal is to connect component causes with each other and the
outcome of interest in order to enable grouping of component causes to
sufficient causes. The structure is designed with this goal in mind. So
far, in our projects, we were able to model all processes within this
structure, but if you encounter something you cannot model, let us know
on GitHub and we adjust the structure accordingly. Please also note
that, although all segment combinations are possible in theory, only
DOES, only WHERE, and WHAT-WHAT, in our experience, do not make much
sense.
IF and IFNOT conditions
IF and IFNOT describe the conditions for the THEN statement to occur. The IF condition must be fulfilled in order for the THEN statement to occur. The IFNOT condition must not be fulfilled in order for the THEN statement to occur. IF and IFNOT themselves are a combination of THEN statements combined with AND/OR logic. By using THEN statements in IF and IFNOT conditions, the individual step can be chained together.
When creating a step, specification of the THEN statement is
mandatory while IF and IFNOT conditions are optional. If the IF
condition is missing, there is no pre-condition and the THEN statement
“just happens”. Therefore, this type of steps are “starting
steps” and their description begins with Start:
.
They form the start of the chains that connect the component causes and
the outcome of interest. In our context, “starting steps” usually
represent component causes. In reality, component causes are, of course,
caused by other factors not otherwise involved in causing the outcome,
e.g., an occupational exposure that causes occupational asthma is caused
by socio-economic factors that influence job choice. But in the context
of SCC model creation when our task is to group the component causes to
sufficient causes, component causes are the starting point and therefore
represented by steps without IF condition. However, please note that
component causes can have IFNOT conditions.
Here are the step descriptions from the built-in steplist
steplist_rain
as an example:
- Start: IFNOT take vacation THEN no vacation
- Start: weekday
- Start: rain
- Start: get groceries
- Start: take umbrella
- Start: work from home
- Start: take vacation
- IF no vacation and weekday and IFNOT work from home THEN walk to work
- IF get groceries or walk to work THEN go outside
- End: IF go outside and rain and IFNOT take umbrella THEN you get wet
Step types
There are, as mentioned, different types of steps that play different roles during SCC model creation:
Based on the presence of an IF condition, we define:
- Starting steps: Steps without IF condition
- Non-starting steps: Steps with IF condition
Starting steps can also be separated into two types:
- Component causes: Starting steps which appear in IF conditions of other steps (and maybe additionally in IFNOT conditions of other steps)
- Interventions: Starting steps which do not appear in IF conditions of other steps but only in IFNOT conditions of other steps
Therefore, component causes, interventions, and non-starting steps are mutually exclusive and together form the complete list of steps.
In addition, we define:
- IFNOT steps: Steps with IFNOT condition, including starting steps with IFNOT condition
- End steps: Steps that appear in outcome definitions
Steplist
The steplist is the structure that contains all specified steps. It
is the only input for function create_scc()
, which creates
the SCC model. See ?new_steplist
for a detailed description
of the structure of steplists from a R perspective.
Additional step attributes
Steps contain additional attributes:
- Module: Modules are groups to which steps are assigned. If modules are used, every step is in exactly one module.
- End step indicator: Indicates if a certain step is part of the outcome definition
- Note: Additional notes, e.g., regarding the level of evidence, etc.
- Reference: Since we model our steps based on the literature, every step should have at least one reference
Steplist elements
In R, steplists are defined as S3 class and contain 8 data.frames.
Here’s a short overview, but also see ?new_steplist
for
details.
- WHAT: Contains WHAT segments
- DOES: Contains DOES segments
- WHERE: Contains WHERE segments
- THEN: Contains THEN statements
- Module: Contains the list of modules
- Step: Contains the list of steps
- ICC: Short for
incompatible component causes
and records combinations of component causes, which cannot appear in practice. Sets of component causes which contain ICCs are not considered during SCC model creation. - Outc: Short for
outcome definition
, which is a list of conditions under which the outcome is assumed to have occurred. In practice, the outcome definition consists of end steps combined by AND/OR logic.
Steplist creation
Steplists are created with the built-in Steplist Creator
Shiny
App. It can be launched with:
We made a little tutorial that shows how to use the
shiny
app, see
vignette("articles/steplist_creator_tutorial")
. It contains
screenshots and an example for you to click along. Please note that the
tutorial is not shipped with the package and can only be accessed on the
homepage.
Processing steplists in R
The steplist created in the shiny
app can be downloaded
as .rds
file and loaded into R using
readRDS()
. Additionally, there are some options to process
the steplist in R, as it might be easier for some standard tasks instead
of clicking through the app. These functions accompany
check_steplist()
, so let’s talk about this function
first.
Checked and unchecked steplists
A steplist needs to fulfill additional structural requirements in
order to be used in create_scc()
. These requirements are
checked with check_steplist()
. The function documentation
contains a detailed description of conducted checks. Some violation will
result in errors, which means that checking was not successful and you
need to make changes. Other violations will only result in warnings,
which suggest some non-mandatory changes. If
check_steplist()
only results in warnings, you will still
get a checked steplist. Let’s look at the built-in
steplist_party
as an example.
steplist <- steplist_party
Now, let’s run check_steplist()
.
steplist_checked <- check_steplist(steplist)
#> ── Checking epicmodel_steplist steplist ──────────────────────────────────────────────────────────────────────────────────────────────────────────
#> ✔ Checking WHAT IDs was successful.
#> ✔ Checking DOES IDs was successful.
#> ✔ Checking WHERE IDs was successful.
#> ✔ Checking Module IDs was successful.
#> ✔ Checking ICC IDs was successful.
#> ✔ Checking WHAT keywords was successful.
#> ✔ Checking DOES keywords was successful.
#> ✔ Checking WHERE keywords was successful.
#> ✔ Checking Module keywords was successful.
#> ✔ Checking Modules was successful.
#> ✖ Checking ICC entries failed!
#> Caused by error in `check_steps_in_icc()`:
#> ! All IDs in data.frame `icc`, i.e. in variables `id1` and `id2`, must be in `id_step` of data.frame `step`!
#> ℹ Data.frame `icc` contains 4 elements with two step IDs each.
#> ✖ In total, 1 ID is not in data.frame `step`: NA
#> ℹ If only `NA` is not in data.frame `step`, use `steplist <- remove_na(steplist)`.
#> ✔ Checking WHAT segments was successful.
#> ! Checking DOES segments resulted in warnings!
#> Caused by warning:
#> ! Not all DOES segments have been used in data.frame `step`!
#> ℹ Data.frame `does` contains 6 elements.
#> ℹ In total, 1 DOES segment is not being used in data.frame `step`: d4
#> ✔ Checking WHERE segments was successful.
#> ! Checking references resulted in warnings!
#> Caused by warning:
#> ! For some steps no references have been provided!
#> ℹ In total, 16 steps have no references.
#> ✔ Checking start/end steps was successful.
#> ✔ Checking THEN statements was successful.
#> ✔ Checking THEN/IF/IFNOT equality was successful.
#> ✔ Checking outcome definitions was successful.
#> ── Summary ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> ✖ Checking failed! Please correct errors and repeat.
You can see that many checks have been successful but also that some
of them resulted in errors or warnings. The first error tells you that
“Checking ICC entries failed!”. As a reminder, ICC is short for
incompatible component causes and it tells you that in the
ICC
data.frame of the steplist, you used step IDs that have
not been defined, i.e., cannot be found in data.frame step
of the steplist. It gives more details: “In total, 1 ID is not in
data.frame step
: NA”. This clarifies a lot: There is still
an entry in the table that contains NA, which is counted as an empty
element. Luckily, in this case, it offers a solution: using
remove_na()
. Next, we have the warning “Not all DOES
segments have been used in data.frame step
!” As it tells us
two lines below, we specified d4
in data.frame
does
but did not use it when creating steps. Even though
this won’t break SCC model creation, let’s also address it. Finally,
check_steplist()
warns us that “For some steps no
references have been provided!”. Because steps are usually based on the
literature, epicmodel
will not get tired of telling you to
specify references in the steplist! In summary, checking failed, which
we can verify by printing steplist_checked
.
print(steplist_checked)
#> ✖ unchecked (please run `check_steplist()` before continuing)
#> WHAT: 9 WHAT segments
#> DOES: 6 DOES segments
#> WHERE: 6 WHERE segments
#> MODULE: 3 modules
#> STEP: 16 STEPs
#> ICC: 4 incompatible component-cause pairs
#> OUTCOME: 1 outcome definition
The first line shows us that steplist_checked
is still
“unchecked”. So, let’s work on those errors and warnings. First, to
remove the NAs from data.frame icc
, let’s run
remove_na()
. It removes rows that only consist of NAs from
data.frames icc
as well as outc
, which
contains the outcome definition. Next, we can delete DOES segment
d4
with function remove_segment()
, which
allows you to delete a single entry from data.frames what
,
does
, where
, module
, and
icc
by specifying its ID.
steplist <- remove_na(steplist)
steplist <- remove_segment(steplist, "d4")
This time, check_steplist()
is successful. Since our
example is about a party, we ignore the warning about references.
steplist_checked <- check_steplist(steplist)
# ── Checking epicmodel_steplist steplist ───────────────────────────────────────────────────────────────────────────────────────────────────────────
# ✔ Checking WHAT IDs was successful.
# ✔ Checking DOES IDs was successful.
# ✔ Checking WHERE IDs was successful.
# ✔ Checking Module IDs was successful.
# ✔ Checking ICC IDs was successful.
# ✔ Checking WHAT keywords was successful.
# ✔ Checking DOES keywords was successful.
# ✔ Checking WHERE keywords was successful.
# ✔ Checking Module keywords was successful.
# ✔ Checking Modules was successful.
# ✔ Checking ICC entries was successful.
# ✔ Checking WHAT segments was successful.
# ✔ Checking DOES segments was successful.
# ✔ Checking WHERE segments was successful.
# ! Checking references resulted in warnings!
# Caused by warning:
# ! For some steps no references have been provided!
# ℹ In total, 16 steps have no references.
# ✔ Checking start/end steps was successful.
# ✔ Checking THEN statements was successful.
# ✔ Checking THEN/IF/IFNOT equality was successful.
# ✔ Checking outcome definitions was successful.
# ── Summary ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
# ✔ Checking successful!
Printing steplist_checked
confirms that checking was
successful.
print(steplist_checked)
#> ✔ checked successfully
#> WHAT: 9 WHAT segments
#> DOES: 5 DOES segments
#> WHERE: 6 WHERE segments
#> MODULE: 3 modules
#> STEP: 16 STEPs
#> ICC: 3 incompatible component-cause pairs
#> OUTCOME: 1 outcome definition
There is another function available for processing steplists in R.
It’s called remove_all_modules()
and, as the name implies,
it removes all modules from data.frame module
and deletes
assigned modules in data.frame step
. epicmodel
expects that either all steps or none of them have a module specified.
With remove_all_modules()
, you have an easy tool for
choosing the second option.
steplist_checked <- remove_all_modules(steplist_checked)
#> ! Changing the steplist makes it necessary to repeat `check_steplist()`!
We already get a warning that we need to repeat
check_steplist()
. Let’s print steplist_checked
to investigate.
print(steplist_checked)
#> ✖ unchecked (please run `check_steplist()` before continuing)
#> WHAT: 9 WHAT segments
#> DOES: 5 DOES segments
#> WHERE: 6 WHERE segments
#> MODULE: 0 modules
#> STEP: 16 STEPs
#> ICC: 3 incompatible component-cause pairs
#> OUTCOME: 1 outcome definition
Indeed, we see that steplist_checked
has 0 modules now
but is “unchecked” again. In fact, remove_na()
and
remove_segment()
also “uncheck” a previously checked
steplist. Additionally, all steplists you download from the
shiny
app are “unchecked” as well, even if you uploaded a
“checked” steplist.