Initial Values¶

In [1]:

Copied!





import warnings
import numpy as np
import hssm
warnings.filterwarnings("ignore")
import warnings
import numpy as np
import hssm
warnings.filterwarnings("ignore")

Folklore suggests that the initial values of our sampler shouldn't matter for the outcome of the analysis, since MCMC will find the relevant region of the parameter space eventually.

Well, we can't trust the elders blindly.. and you will sometimes find that initial value setting need to be corrected for good results.

There are multiple reasons, but chief among them is that we are routinely dealing with constrained parameter spaces. At the edges of these parameter spaces, likelihoods can become less well behaved (this is true e.g. with our approx_differentiable likelihoods based on LANs). These edges may be unlikely a priori, but if your sampler takes a path to the parameter space (even if just on the way to a distant mode), that passes through these regions, you may get undesirable results.

There are two ways in which you can control the bounds of a parameter:

For an individual parameter, you can specify a Truncated distribution, or simply choose a prior distribution that naturally lives in the desired space (e.g. a Gamma distribution when dealing with positivity constraints).
To constrain the outcome of a regression you can always use a Link function that will target the desired output space.

Option 2., while computationally kosher in principle, can produce some downstream headaches, since it changes the interpretation of parameter values from straightforward to e.g. log-odds (Logistic Regression with logit link). This demands careful thinking about priors and can sometimes make reporting of result more difficult.

Therefore you may find yourself in the situation that you do not want to use link functions or other a priori constaints, while still needing to respect pathologies concerning regions of the parameter space.

At this point, the wisdom of the ancients aside, practical considerations will force you to think about initial values.

We are not here to prescribe you how to deal with this, but we try to provide you with options. This short tutorial illustrates how to inspect and adjust the initial value settings of an HSSM model.

Load up some data¶

In [2]:

Copied!

cav_data = hssm.load_data("cavanagh_theta")
cav_data = hssm.load_data("cavanagh_theta")

Simple model¶

In [3]:

Copied!





model = hssm.HSSM(
    data=cav_data,
    model="ddm",
    loglik_kind="approx_differentiable",
    model_config={"backend": "pytensor"},
)
model = hssm.HSSM(
    data=cav_data,
    model="ddm",
    loglik_kind="approx_differentiable",
    model_config={"backend": "pytensor"},
)

Model initialized successfully.

Now we can inspect the initial value setting.

In [4]:

Copied!

model.initvals
model.initvals

Out[4]:

{'a': array(1.5), 'z': array(0.5), 't': array(0.025), 'v': array(0.)}

Set Initial Values¶

We illustrate the preferred way of setting initial values manually.

Route 1¶

Define a custom dictionary and pass it to the sampler.

In [5]:

Copied!

from copy import deepcopy

my_initvals = deepcopy(model.initvals)
my_initvals["z"] = np.array(0.6, dtype="float32")
from copy import deepcopy

my_initvals = deepcopy(model.initvals)
my_initvals["z"] = np.array(0.6, dtype="float32")

In [6]:

Copied!





idata = model.sample(draws=100,
                     tune=100,
                     sampler="mcmc",
                     initvals = my_initvals)
idata = model.sample(draws=100,
                     tune=100,
                     sampler="mcmc",
                     initvals = my_initvals)

Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a, z, t, v]

Output()

Sampling 4 chains for 100 tune and 100 draw iterations (400 + 400 draws total) took 30 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
100%|██████████| 400/400 [00:20<00:00, 19.74it/s]

We allowed the sampler very little tuning, and and therefore our initial values are still apparent in the chains.

Route 2¶

Adjust the ._initval attribute. model.initvals is a class property, which serves as an accessor the the underlying ._initvals attribute. We can adjust this directly, and it will be used as the default for our sampler.

In [8]:

Copied!

model._initvals
model._initvals

Out[8]:

{'a': array(1.5), 'z': array(0.5), 't': array(0.025), 'v': array(0.)}

In [9]:

Copied!

model._initvals["z"] = np.array(0.6, dtype="float32")
idata2 = model.sample(draws=10, tune=1)
model._initvals["z"] = np.array(0.6, dtype="float32")
idata2 = model.sample(draws=10, tune=1)

Using default initvals. 

The model has already been sampled. Overwriting the previous inference object. Any previous reference to the inference object will still point to the old object.

Only 10 samples per chain. Reliable r-hat and ESS diagnostics require longer chains for accurate estimate.
Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a, z, t, v]

Output()

Sampling 4 chains for 1 tune and 10 draw iterations (4 + 40 draws total) took 1 seconds.
There were 40 divergences after tuning. Increase `target_accept` or reparameterize.
The number of samples is too small to check convergence reliably.
100%|██████████| 40/40 [00:02<00:00, 14.48it/s]

While models can become a lot more complicated, you will be able to adjust initial values via this process consistently.

Link function¶

Setting initial values for parameters when working with link functions becomes a little more tricky. Below is an example.

Let's define a simple regression with logit link functions.

In [11]:

Copied!





model = hssm.HSSM(
    data=cav_data,
    model="ddm",
    link_settings="log_logit",
    loglik_kind="approx_differentiable",
    include=[{"name": "a", "formula": "a ~ 1 + theta"}],
)
model = hssm.HSSM(
    data=cav_data,
    model="ddm",
    link_settings="log_logit",
    loglik_kind="approx_differentiable",
    include=[{"name": "a", "formula": "a ~ 1 + theta"}],
)

Model initialized successfully.

In [12]:

Copied!

model.initvals
model.initvals

Out[12]:

{'z': array(0.5),
 't': array(0.025),
 'v': array(0.),
 'a_Intercept': array(0.),
 'a_theta': array(0.)}

Well, a_Intercept gets a default initial value of 0?

a is the boundary separation parameter in the drift diffusion model, a setting of 0 seems like an extremely bad choice... It would lead to a pointmass rt at 0s!

What's going on here? Let's expect our model...

In [13]:

Copied!

model
model

Out[13]:

Hierarchical Sequential Sampling Model
Model: ddm

Response variable: rt,response
Likelihood: approx_differentiable
Observations: 3988

Parameters:

v:
    Prior: Uniform(lower: -3.0, upper: 3.0)
    Explicit bounds: (-3.0, 3.0)
 (ignored due to link function)
a:
    Formula: a ~ 1 + theta
    Priors:
        a_Intercept ~ Normal(mu: 0.0, sigma: 0.25)
        a_theta ~ Normal(mu: 0.0, sigma: 0.25)
    Link: Generalized logit link function with bounds (0.3, 2.5)
    Explicit bounds: (0.3, 2.5)
 (ignored due to link function)
z:
    Prior: Uniform(lower: 0.0, upper: 1.0)
    Explicit bounds: (0.0, 1.0)
 (ignored due to link function)
t:
    Prior: HalfNormal(sigma: 2.0)
    Explicit bounds: (0.0, 2.0)
 (ignored due to link function)

Lapse probability: 0.05
Lapse distribution: Uniform(lower: 0.0, upper: 20.0)

Ah, we apply the Generalized Logit link to the a parameters. The initial value of the actual parameter a that the likelihood will receive, is the output of the transformation.

Our hssm model class includes everything we need to inspect this further.

What would we expect here?

The generalized logit transformation (link), has the associated generalized sigmoid transformation (inverse link) as the forward transform. We expect that, evaluating this transformation at 0, should give back the mean value between the explicit bounds we set for the parameter.

So we expect: $(0.3 + 2.5) / 2 = 1.4$... let's check this.

Note:

The forward link, (from parameter to function output that the likelihood receives), is call inverse link function in GLM lingo. We follow this nomenclature in HSSM.

In [14]:

Copied!

model.params["a"].link.linkinv(model._initvals["a_Intercept"])
model.params["a"].link.linkinv(model._initvals["a_Intercept"])

Out[14]:

1.4000000000000001

Voila! This checks out! We note that the linkinv() function can come in handy if you want to play around with initial value setting in the context of using link functions yourself.

HSSM's initial value defaults logic¶

We try to guide the initial value settings in HSSM, with reasonble defaults that hopefully work in many cases without needed further adjustments. It is however difficulty to find settings that work blindly.

We follow the guidelines below:

Avoid known issues near parameter boundaries(especially important for approx_differentiable likelihoods) --> whenever possible, initialize near the center of bounded parameter spaces
Starting values for the t parameter should be low to avoid known pathologies of the analytic DDM likelihood when the smallest reaction times (rt) values come close to t
In a regression setting, minimize the spread of offset parameters to avoid inadvertently running into parameter limits and initialize all but the Intercept parameter to 0

NOTE:

Especially guideline 3. is a very conservative setting, focused solely on avoiding boundary behavior. This will NOT always be a smart idea, and it may sometimes collaterally have a negative impact on convergence.

We encourage users to actively play with initial value setting at the moment.

Example¶

In [15]:

Copied!

cav_data
cav_data

Out[15]:

	participant_id	stim	rt	response	theta	dbs	conf
0	0	LL	1.210	1.0	0.656275	1	HC
1	0	WL	1.630	1.0	-0.327889	1	LC
2	0	WW	1.030	1.0	-0.480285	1	HC
3	0	WL	2.770	1.0	1.927427	1	LC
4	0	WW	1.140	-1.0	-0.213236	1	HC
...	...	...	...	...	...	...	...
3983	13	LL	1.450	-1.0	-1.237166	0	HC
3984	13	WL	0.711	1.0	-0.377450	0	LC
3985	13	WL	0.784	1.0	-0.694194	0	LC
3986	13	LL	2.350	-1.0	-0.546536	0	HC
3987	13	WW	1.250	1.0	0.752388	0	HC

3988 rows × 7 columns

In [16]:

Copied!





model_reg = hssm.HSSM(
    data=cav_data,
    model="ddm",
    prior_settings="safe",
    loglik_kind="approx_differentiable",
    include=[{"name": "a", "formula": "a ~ 1 + theta + (1|participant_id)"}],
)
model_reg = hssm.HSSM(
    data=cav_data,
    model="ddm",
    prior_settings="safe",
    loglik_kind="approx_differentiable",
    include=[{"name": "a", "formula": "a ~ 1 + theta + (1|participant_id)"}],
)

Model initialized successfully.

In [17]:

Copied!

model_reg.initvals
model_reg.initvals

Out[17]:

{'z': array(0.5),
 't': array(0.025),
 'v': array(0.),
 'a_Intercept': array(1.5),
 'a_theta': array(0.),
 'a_1|participant_id_sigma': array(0.27082359),
 'a_1|participant_id_offset': array([-0.00037851, -0.00733303, -0.00145185, -0.0062298 ,  0.00852909,
         0.00403854,  0.00332884, -0.00735726, -0.00606049,  0.00591678,
         0.00637818, -0.00133482, -0.00313939, -0.00643042])}

Let's discuss what we see here:

We do NOT apply a link function in this case, so guideline 1 applies without further cognitive effort. By contrast to the previous example, check how a_Intercept is now directly initialized as 1.5, near the middle of the parameter range.
Covariate betas, in this case a_theta as initialized to 0
The offset parameters associated with individual parameters in group hierarchies are initialized cloe to 0
t is initialized close to 0 as per guideline 2.
The remaining parameters are set close to or at the middle of the allowed parameters space (if bounded)

For parameters that we do not actively manipulate, PyMC (BAMBI) defaults are applied unchanged (e.g. here: a_1|participant_id_sigma). These setting may sometimes be sub-optimal for applications in HSSM, hence we again caution the user to take an active approach towards investiging initial values in case of convergence issues.

Parameters¶

There are two keyword arguments (kwargs) that we can set in the the base HSSM class.

process_initvals: bool turns processing of initial values on and off
initval_jitter: float which applies an uniform jitter around vector values initial values

In [18]:

Copied!





model_no_initval = hssm.HSSM(
    data=cav_data,
    model="ddm",
    loglik_kind="approx_differentiable",
    include=[{"name": "a", "formula": "a ~ 1 + theta + (1|participant_id)"}],
    process_initvals=False,
)
model_no_initval = hssm.HSSM(
    data=cav_data,
    model="ddm",
    loglik_kind="approx_differentiable",
    include=[{"name": "a", "formula": "a ~ 1 + theta + (1|participant_id)"}],
    process_initvals=False,
)

Model initialized successfully.

We can see the result of turning initial value processing off (NOT recommended if you want to use defaults at all). This applies BAMBI/PyMC defaults, and adds default jitter to vector valued parameters.

In [19]:

Copied!

model_no_initval.initvals
model_no_initval.initvals

Out[19]:

{'z': array(0.5),
 't': array(2.),
 'v': array(0.),
 'a_Intercept': array(1.4),
 'a_theta': array(0.),
 'a_1|participant_id_sigma': array(0.27082359),
 'a_1|participant_id_offset': array([ 2.42653745e-03,  3.65731912e-03, -4.10165032e-03, -1.00915984e-03,
        -4.91469027e-03, -8.85937922e-03, -1.81710875e-05, -1.28249545e-03,
        -2.10299902e-03,  5.82275214e-03,  2.88577774e-03,  5.36489440e-03,
        -2.40398804e-04, -7.74519145e-03])}

If you want to change the magnitude of the default jitter, you can manipulate it via the initval_jitter argument.

In [20]:

Copied!





model_jitter = hssm.HSSM(
    data=cav_data,
    model="ddm",
    loglik_kind="approx_differentiable",
    include=[{"name": "a",
              "formula": "a ~ 1 + theta + (1|participant_id)"}],
    process_initvals=False,
    initval_jitter=0.5,
)
model_jitter = hssm.HSSM(
    data=cav_data,
    model="ddm",
    loglik_kind="approx_differentiable",
    include=[{"name": "a",
              "formula": "a ~ 1 + theta + (1|participant_id)"}],
    process_initvals=False,
    initval_jitter=0.5,
)

Model initialized successfully.

In [21]:

Copied!

model_jitter.initvals
model_jitter.initvals

Out[21]:

{'z': array(0.5),
 't': array(2.),
 'v': array(0.),
 'a_Intercept': array(1.4),
 'a_theta': array(0.),
 'a_1|participant_id_sigma': array(0.27082359),
 'a_1|participant_id_offset': array([ 0.45326236, -0.14712635,  0.14375621, -0.07680948, -0.26954079,
         0.46372104,  0.31306332, -0.21974367, -0.02417173, -0.45023316,
        -0.44458553, -0.48113295,  0.12384263,  0.25271717])}