Hierarchical Modeling¶

This tutorial demonstrates how to take advantage of HSSM's hierarchical modeling capabilities. We will cover the following:

How to define a mixed-effect regression
How to define a hierarchial HSSM model
How to apply prior and link function settings to ensure successful sampling

Colab Instructions¶

If you would like to run this tutorial on Google colab, please click this link.

Once you are in the colab, follow the installation instructions below and then restart your runtime.

Just uncomment the code in the next code cell and run it!

NOTE:

You may want to switch your runtime to have a GPU or TPU. To do so, go to Runtime > Change runtime type and select the desired hardware accelerator.

Note that if you switch your runtime you have to follow the installation instructions again.

In [ ]:

Copied!

# !pip install hssm
# !pip install hssm

Import Modules¶

In [ ]:

Copied!

import hssm

%matplotlib inline
%config InlineBackend.figure_format='retina'
import hssm

%matplotlib inline
%config InlineBackend.figure_format='retina'

Setting the global float type¶

Note: Using the analytical DDM (Drift Diffusion Model) likelihood in PyMC without setting the float type in PyTensor may result in warning messages during sampling, which is a known bug in PyMC v5.6.0 and earlier versions. To avoid these warnings, we provide a convenience function:

In [ ]:

Copied!

hssm.set_floatX("float32")
hssm.set_floatX("float32")

Setting PyTensor floatX type to float32.
Setting "jax_enable_x64" to False. If this is not intended, please set `jax` to False.

1. Defining Regressions¶

Under the hood, HSSM uses bambi for model creation. bambi takes inspiration from the lme4 package in R and supports the definition of generalized linear mixed-effect models through R-like formulas and concepts such as link functions. This makes it possible to create arbitrary mixed-effect regressions in HSSM, which is one advantage of HSSM over HDDM. Now let's walk through the ways to define a parameter with a regression in HSSM.

Specifying fixed- and random-effect terms¶

Suppose that we want to define a parameter v that has a regression defined. There are two ways to define such a parameter - either through a dictionary or through a hssm.Param object:

# The following code are equivalent,
# including the definition of the formula.

# The dictionary way:
param_v = {
    "name": "v",
    "formula": "v ~ x + y + x:y + (1|participant_id)",
    "link": "identity",
    "prior": {
        "Intercept": {"name": "Normal", "mu": 0.0, "sigma": 0.25},
        "1|participant_id": {
            "name": "Normal",
            "mu": 0.0,
            "sigma": {"name": "HalfNormal", "sigma": 0.2},  # this is a hyperprior
        },
        "x": {"name": "Normal", "mu": 0.0, "sigma": 0.25},
    },
}

# The object-oriented way
param_v = hssm.Param(
    "v",
    formula="v ~ 1 + x*y + (1|participant_id)",
    link="identity",
    prior={
        "Intercept": hssm.Prior("Normal", mu=0.0, sigma=0.25),
        "1|participant_id": hssm.Prior(
            "Normal",
            mu=0.0,
            sigma=hssm.Prior("HalfNormal", sigma=0.2), # this is a hyperprior
        ),
        "x": hssm.Prior("Normal", mu=0.0, sigma=0.25),
    },
)

The formula "v ~ x + y + x:y + (1|participant_id)" defines a random-intercept model. Like R, unless otherwise specified, a fixed-effect intercept term is added to the formula by default. You can make this explicit by adding a 1 to the formula. Or, if your regression does not have an intercept. you can explicitly remove the intercept term by using a 0 in the place of 1: "v ~ 0 + x * y + (1|participant_id)". We recommend that the random effect terms should be specified after the fixed effect terms.

Other fixed effect covariates are x, y, and the interaction term x:y. When all three terms are present, you can use the shortcut x * y in place of the three terms.

The only random effect term in this model is 1|participant_id. It is a random-intercept term with participant_id indicating the grouping variable. You can add another random-effect term in a similar way: "v ~ x + y + x:y + (1|participant_id) + (x|participant_id)", or more briefly, "v ~ x + y + x:y + (1 + x|participant_id)".

Specifying priors for fixed- and random-effect terms:¶

As demonstrated in the above code, you can specify priors of each term through a dictionary, with the key being the name of each term, and the corresponding value being the prior specification, etiher through a dictionary, or a hssm.Prior object. There are a few things to note:

The prior of fixed-effect intercept is specified with "Intercept", capitalized.
For random effects, you can specify hyperpriors for the parameters of of their priors.

Specifying the link functions:¶

Link functions is another concept in frequentist generalized linear models, which defines a transformation between the linear combination of the covariates and the response variable. This is helpful especially when the response variable is not normally distributed, e.g. in a logistic regression. In HSSM, the link function is identity by default. However, since some parameters of SSMs are defined on (0, inf) or (0, 1), link function can be helpful in ensuring the result of the regression is defined for these parameters. We will come back to this later.

2. Defining a hierarchical HSSM model¶

In fact, HSSM does not differentiate between a hierarchical or non-hierarchical model. A hierarchical model in HSSM is simply a model with one or more parameters defined as regressions. However, HSSM does provide some useful functionalities in creating hierarchical models.

BREAKING CHANGES: Use `global_formula` instead of `hierarchical` parameter when creating an HSSM model¶

In HSSM v0.2.5, we have removed the hierarchical parameter to the hssm.HSSM class. In older versions, HSSM had a hierarchical argument which was a bool. It serves as a convenient switch to add a random-intercept regression to any parameter that is not explicitly defined by the user, using participant_id as a grouping variable.

However, this hierarchical parameter caused much confusion, because many believed that somehow hierarchical would magically turn the model into a hierarchical model, while in reality, it does nothing more than adding a y ~ 1 + (1|participant_id) to all parameter, where y stands for the name of that parameter. That is why we removed this confusing parameter in favor of the new global_formula parameter, which is less confusing and offers the users more convenience and transparent control over the models that they want to create.

When specified, global_formula adds the specified formula to all parameters. Therefore, when set to y ~ 1 + (1|participant_id), this is equivalent to setting hierarchical=True in older versions of HSSM. However, the users can set it to any formula they want to apply to all parameters. HSSM is agnostic to whatever parameter name to the left of the ~ sign, while using y is more customary.

Note

In HSSM, the default grouping variable is now `participant_id`, which is different from `subj_idx` in HDDM.

In [ ]:

Copied!

# Load a package-supplied dataset
cav_data = hssm.load_data("cavanagh_theta")
# Load a package-supplied dataset
cav_data = hssm.load_data("cavanagh_theta")

In [ ]:

Copied!

# Define a basic non-hierarchical model
model_non_hierarchical = hssm.HSSM(data=cav_data)
model_non_hierarchical
# Define a basic non-hierarchical model
model_non_hierarchical = hssm.HSSM(data=cav_data)
model_non_hierarchical

Model initialized successfully.

Out[ ]:

Hierarchical Sequential Sampling Model
Model: ddm

Response variable: rt,response
Likelihood: analytical
Observations: 3988

Parameters:

v:
    Prior: Normal(mu: 0.0, sigma: 2.0)
    Explicit bounds: (-inf, inf)

a:
    Prior: HalfNormal(sigma: 2.0)
    Explicit bounds: (0.0, inf)

z:
    Prior: Uniform(lower: 0.0, upper: 1.0)
    Explicit bounds: (0.0, 1.0)

t:
    Prior: HalfNormal(sigma: 2.0)
    Explicit bounds: (0.0, inf)


Lapse probability: 0.05
Lapse distribution: Uniform(lower: 0.0, upper: 20.0)

In [ ]:

Copied!





# Specifying a global formula
# This is equivalent to setting `hierarchical` to True
model_hierarchical = hssm.HSSM(
    data=cav_data, global_formula="y ~ 1 + (1|participant_id)", prior_settings="safe"
)
model_hierarchical
# Specifying a global formula
# This is equivalent to setting `hierarchical` to True
model_hierarchical = hssm.HSSM(
    data=cav_data, global_formula="y ~ 1 + (1|participant_id)", prior_settings="safe"
)
model_hierarchical

Model initialized successfully.

Out[ ]:

Hierarchical Sequential Sampling Model
Model: ddm

Response variable: rt,response
Likelihood: analytical
Observations: 3988

Parameters:

v:
    Formula: v ~ 1 + (1|participant_id)
    Priors:
        v_Intercept ~ Normal(mu: 2.0, sigma: 3.0)
        v_1|participant_id ~ Normal(mu: 0.0, sigma: Weibull(alpha: 1.5, beta: 0.30000001192092896))
    Link: identity
    Explicit bounds: (-inf, inf)

a:
    Formula: a ~ 1 + (1|participant_id)
    Priors:
        a_Intercept ~ Gamma(mu: 1.5, sigma: 0.75)
        a_1|participant_id ~ Normal(mu: 0.0, sigma: Weibull(alpha: 1.5, beta: 0.30000001192092896))
    Link: identity
    Explicit bounds: (0.0, inf)

z:
    Formula: z ~ 1 + (1|participant_id)
    Priors:
        z_Intercept ~ Beta(alpha: 10.0, beta: 10.0)
        z_1|participant_id ~ Normal(mu: 0.0, sigma: Weibull(alpha: 1.5, beta: 0.30000001192092896))
    Link: identity
    Explicit bounds: (0.0, 1.0)

t:
    Formula: t ~ 1 + (1|participant_id)
    Priors:
        t_Intercept ~ Gamma(mu: 0.20000000298023224, sigma: 0.20000000298023224)
        t_1|participant_id ~ Normal(mu: 0.0, sigma: Weibull(alpha: 1.5, beta: 0.30000001192092896))
    Link: identity
    Explicit bounds: (0.0, inf)


Lapse probability: 0.05
Lapse distribution: Uniform(lower: 0.0, upper: 20.0)

3. Intelligent defaults for complex hierarchical models¶

bambi is not designed with HSSM in mind. Therefore, in cases where priors for certain parameters are not defined, the default priors supplied by bambi sometimes are not optimal. The same goes for link functions. "identity" link functions tend not to work well for certain parameters that are not defined on (inf, inf). Therefore, we provide some default settings that the users can experiment to ensure that sampling is successful.

`prior_settings`¶

Currently we provide a "safe" strategy that uses HSSM default priors, which is turned on by default for parameters that are targets of regressions. One can compare the two models below, with safe strategy turned on and off:

In [ ]:

Copied!





model_safe = hssm.HSSM(
    data=cav_data,
    global_formula="y ~ 1 + (1|participant_id)",
    prior_settings="safe",
    loglik_kind="approx_differentiable",
)
model_safe
model_safe = hssm.HSSM(
    data=cav_data,
    global_formula="y ~ 1 + (1|participant_id)",
    prior_settings="safe",
    loglik_kind="approx_differentiable",
)
model_safe

Model initialized successfully.

Out[ ]:

Hierarchical Sequential Sampling Model
Model: ddm

Response variable: rt,response
Likelihood: approx_differentiable
Observations: 3988

Parameters:

v:
    Formula: v ~ 1 + (1|participant_id)
    Priors:
        v_Intercept ~ Normal(mu: 0.0, sigma: 0.25)
        v_1|participant_id ~ Normal(mu: 0.0, sigma: Weibull(alpha: 1.5, beta: 0.30000001192092896))
    Link: identity
    Explicit bounds: (-3.0, 3.0)

a:
    Formula: a ~ 1 + (1|participant_id)
    Priors:
        a_Intercept ~ Normal(mu: 1.399999976158142, sigma: 0.25)
        a_1|participant_id ~ Normal(mu: 0.0, sigma: Weibull(alpha: 1.5, beta: 0.30000001192092896))
    Link: identity
    Explicit bounds: (0.3, 2.5)

z:
    Formula: z ~ 1 + (1|participant_id)
    Priors:
        z_Intercept ~ Normal(mu: 0.5, sigma: 0.25)
        z_1|participant_id ~ Normal(mu: 0.0, sigma: Weibull(alpha: 1.5, beta: 0.30000001192092896))
    Link: identity
    Explicit bounds: (0.0, 1.0)

t:
    Formula: t ~ 1 + (1|participant_id)
    Priors:
        t_Intercept ~ Normal(mu: 1.0, sigma: 0.25)
        t_1|participant_id ~ Normal(mu: 0.0, sigma: Weibull(alpha: 1.5, beta: 0.30000001192092896))
    Link: identity
    Explicit bounds: (0.0, 2.0)


Lapse probability: 0.05
Lapse distribution: Uniform(lower: 0.0, upper: 20.0)

In [ ]:

Copied!





model_safe_off = hssm.HSSM(
    data=cav_data,
    global_formula="y ~ 1 + (1|participant_id)",
    prior_settings=None,
    loglik_kind="approx_differentiable",
)
model_safe_off
model_safe_off = hssm.HSSM(
    data=cav_data,
    global_formula="y ~ 1 + (1|participant_id)",
    prior_settings=None,
    loglik_kind="approx_differentiable",
)
model_safe_off

Model initialized successfully.

Out[ ]:

Hierarchical Sequential Sampling Model
Model: ddm

Response variable: rt,response
Likelihood: approx_differentiable
Observations: 3988

Parameters:

v:
    Formula: v ~ 1 + (1|participant_id)
    Priors:
        v_Intercept ~ Normal(mu: 0.0, sigma: 2.5)
        v_1|participant_id ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 2.5))
    Link: identity
    Explicit bounds: (-3.0, 3.0)

a:
    Formula: a ~ 1 + (1|participant_id)
    Priors:
        a_Intercept ~ Normal(mu: 0.0, sigma: 1.0)
        a_1|participant_id ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 1.0))
    Link: identity
    Explicit bounds: (0.3, 2.5)

z:
    Formula: z ~ 1 + (1|participant_id)
    Priors:
        z_Intercept ~ Normal(mu: 0.0, sigma: 1.0)
        z_1|participant_id ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 1.0))
    Link: identity
    Explicit bounds: (0.0, 1.0)

t:
    Formula: t ~ 1 + (1|participant_id)
    Priors:
        t_Intercept ~ Normal(mu: 0.0, sigma: 1.0)
        t_1|participant_id ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 1.0))
    Link: identity
    Explicit bounds: (0.0, 2.0)


Lapse probability: 0.05
Lapse distribution: Uniform(lower: 0.0, upper: 20.0)

`link_settings`¶

We also provide a link_settings switch, which changes default link functions for parameters according to their explicit bounds. See the model below with link_settings set to "log_logit":

In [ ]:

Copied!





model_log_logit = hssm.HSSM(
    data=cav_data,
    global_formula="y ~ 1 + (1|participant_id)",
    prior_settings=None,
    link_settings="log_logit",
)
model_log_logit
model_log_logit = hssm.HSSM(
    data=cav_data,
    global_formula="y ~ 1 + (1|participant_id)",
    prior_settings=None,
    link_settings="log_logit",
)
model_log_logit

Model initialized successfully.

Out[ ]:

Hierarchical Sequential Sampling Model
Model: ddm

Response variable: rt,response
Likelihood: analytical
Observations: 3988

Parameters:

v:
    Formula: v ~ 1 + (1|participant_id)
    Priors:
        v_Intercept ~ Normal(mu: 0.0, sigma: 2.5)
        v_1|participant_id ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 2.5))
    Link: identity
    Explicit bounds: (-inf, inf)
 (ignored due to link function)
a:
    Formula: a ~ 1 + (1|participant_id)
    Priors:
        a_Intercept ~ Normal(mu: 0.0, sigma: 1.0)
        a_1|participant_id ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 1.0))
    Link: log
    Explicit bounds: (0.0, inf)
 (ignored due to link function)
z:
    Formula: z ~ 1 + (1|participant_id)
    Priors:
        z_Intercept ~ Normal(mu: 0.0, sigma: 1.0)
        z_1|participant_id ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 1.0))
    Link: Generalized logit link function with bounds (0.0, 1.0)
    Explicit bounds: (0.0, 1.0)
 (ignored due to link function)
t:
    Formula: t ~ 1 + (1|participant_id)
    Priors:
        t_Intercept ~ Normal(mu: 0.0, sigma: 1.0)
        t_1|participant_id ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 1.0))
    Link: log
    Explicit bounds: (0.0, inf)
 (ignored due to link function)

Lapse probability: 0.05
Lapse distribution: Uniform(lower: 0.0, upper: 20.0)

Mixing strategies:¶

It is possible to turn on both prior_settings and link_settings:

In [ ]:

Copied!





model_safe_loglogit = hssm.HSSM(
    data=cav_data,
    global_formula="y ~ 1 + (1|participant_id)",
    prior_settings="safe",
    link_settings="log_logit",
)
model_safe_loglogit
model_safe_loglogit = hssm.HSSM(
    data=cav_data,
    global_formula="y ~ 1 + (1|participant_id)",
    prior_settings="safe",
    link_settings="log_logit",
)
model_safe_loglogit

Model initialized successfully.

Out[ ]:

Hierarchical Sequential Sampling Model
Model: ddm

Response variable: rt,response
Likelihood: analytical
Observations: 3988

Parameters:

v:
    Formula: v ~ 1 + (1|participant_id)
    Priors:
        v_Intercept ~ Normal(mu: 0.0, sigma: 0.25)
        v_1|participant_id ~ Normal(mu: 0.0, sigma: Weibull(alpha: 1.5, beta: 0.30000001192092896))
    Link: identity
    Explicit bounds: (-inf, inf)
 (ignored due to link function)
a:
    Formula: a ~ 1 + (1|participant_id)
    Priors:
        a_Intercept ~ Normal(mu: 0.0, sigma: 0.25)
        a_1|participant_id ~ Normal(mu: 0.0, sigma: Weibull(alpha: 1.5, beta: 0.30000001192092896))
    Link: log
    Explicit bounds: (0.0, inf)
 (ignored due to link function)
z:
    Formula: z ~ 1 + (1|participant_id)
    Priors:
        z_Intercept ~ Normal(mu: 0.0, sigma: 0.25)
        z_1|participant_id ~ Normal(mu: 0.0, sigma: Weibull(alpha: 1.5, beta: 0.30000001192092896))
    Link: Generalized logit link function with bounds (0.0, 1.0)
    Explicit bounds: (0.0, 1.0)
 (ignored due to link function)
t:
    Formula: t ~ 1 + (1|participant_id)
    Priors:
        t_Intercept ~ Normal(mu: 0.0, sigma: 0.25)
        t_1|participant_id ~ Normal(mu: 0.0, sigma: Weibull(alpha: 1.5, beta: 0.30000001192092896))
    Link: log
    Explicit bounds: (0.0, inf)
 (ignored due to link function)

Lapse probability: 0.05
Lapse distribution: Uniform(lower: 0.0, upper: 20.0)

In [ ]:

Hierarchical Modeling¶

Colab Instructions¶

Import Modules¶

Setting the global float type¶

1. Defining Regressions¶

Specifying fixed- and random-effect terms¶

Specifying priors for fixed- and random-effect terms:¶

Specifying the link functions:¶

2. Defining a hierarchical HSSM model¶

BREAKING CHANGES: Use global_formula instead of hierarchical parameter when creating an HSSM model¶

3. Intelligent defaults for complex hierarchical models¶

prior_settings¶

link_settings¶

Mixing strategies:¶

BREAKING CHANGES: Use `global_formula` instead of `hierarchical` parameter when creating an HSSM model¶

`prior_settings`¶

`link_settings`¶