Initial Values¶
import warnings
import numpy as np
import hssm
warnings.filterwarnings("ignore")
Folklore suggests that the initial values of our sampler shouldn't matter for the outcome of the analysis, since MCMC will find the relevant region of the parameter space eventually.
Well, we can't trust the elders blindly.. and you will sometimes find that initial value setting need to be corrected for good results.
There are multiple reasons, but chief among them is that we are routinely dealing with constrained parameter spaces. At the edges of these parameter spaces,
likelihoods can become less well behaved (this is true e.g. with our approx_differentiable
likelihoods based on LANs).
These edges may be unlikely a priori, but if your sampler takes a path to the parameter space (even if just on the way to a distant mode), that passes through
these regions, you may get undesirable results.
There are two ways in which you can control the bounds of a parameter:
- For an individual parameter, you can specify a
Truncated
distribution, or simply choose a prior distribution that naturally lives in the desired space (e.g. aGamma
distribution when dealing with positivity constraints). - To constrain the outcome of a regression you can always use a
Link
function that will target the desired output space.
Option 2., while computationally kosher in principle, can produce some downstream headaches, since it changes the interpretation of parameter values from straightforward to e.g. log-odds (Logistic Regression with logit link). This demands careful thinking about priors and can sometimes make reporting of result more difficult.
Therefore you may find yourself in the situation that you do not want to use link functions or other a priori constaints, while still needing to respect pathologies concerning regions of the parameter space.
At this point, the wisdom of the ancients aside, practical considerations will force you to think about initial values.
We are not here to prescribe you how to deal with this, but we try to provide you with options. This short tutorial illustrates how to inspect and adjust the initial value settings
of an HSSM
model.
Load up some data¶
cav_data = hssm.load_data("cavanagh_theta")
Simple model¶
model = hssm.HSSM(
data=cav_data,
model="ddm",
loglik_kind="approx_differentiable",
model_config={"backend": "pytensor"},
)
Model initialized successfully.
Now we can inspect the initial value setting.
model.initvals
{'a': array(1.5), 'z': array(0.5), 't': array(0.025), 'v': array(0.)}
from copy import deepcopy
my_initvals = deepcopy(model.initvals)
my_initvals["z"] = np.array(0.6, dtype="float32")
idata = model.sample(draws=100,
tune=100,
sampler="mcmc",
initvals = my_initvals)
Initializing NUTS using adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [a, z, t, v]
Output()
Sampling 4 chains for 100 tune and 100 draw iterations (400 + 400 draws total) took 30 seconds. The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details 100%|██████████| 400/400 [00:20<00:00, 19.74it/s]
idata.posterior
<xarray.Dataset> Size: 14kB Dimensions: (chain: 4, draw: 100) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 800B 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99 Data variables: a (chain, draw) float64 3kB 1.046 1.022 1.055 ... 1.02 1.033 1.028 z (chain, draw) float64 3kB 0.4977 0.4876 0.5079 ... 0.4903 0.4865 t (chain, draw) float64 3kB 0.3546 0.3686 0.3514 ... 0.3684 0.3635 v (chain, draw) float64 3kB 0.4031 0.4036 0.3817 ... 0.4121 0.3948 Attributes: created_at: 2025-07-13T13:47:42.792184+00:00 arviz_version: 0.21.0 inference_library: pymc inference_library_version: 5.21.1 sampling_time: 30.432159900665283 tuning_steps: 100 modeling_interface: bambi modeling_interface_version: 0.15.0
We allowed the sampler very little tuning, and and therefore our initial values are still apparent in the chains.
Route 2¶
Adjust the ._initval
attribute.
model.initvals
is a class property, which serves as an accessor the the underlying ._initvals
attribute.
We can adjust this directly, and it will be used as the default for our sampler.
model._initvals
{'a': array(1.5), 'z': array(0.5), 't': array(0.025), 'v': array(0.)}
model._initvals["z"] = np.array(0.6, dtype="float32")
idata2 = model.sample(draws=10, tune=1)
Using default initvals. The model has already been sampled. Overwriting the previous inference object. Any previous reference to the inference object will still point to the old object.
Only 10 samples per chain. Reliable r-hat and ESS diagnostics require longer chains for accurate estimate. Initializing NUTS using adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [a, z, t, v]
Output()
Sampling 4 chains for 1 tune and 10 draw iterations (4 + 40 draws total) took 1 seconds. There were 40 divergences after tuning. Increase `target_accept` or reparameterize. The number of samples is too small to check convergence reliably. 100%|██████████| 40/40 [00:02<00:00, 14.48it/s]
idata2
-
<xarray.Dataset> Size: 1kB Dimensions: (chain: 4, draw: 10) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 80B 0 1 2 3 4 5 6 7 8 9 Data variables: a (chain, draw) float64 320B 1.5 1.5 1.5 1.5 1.5 ... 1.5 1.5 1.5 1.5 z (chain, draw) float64 320B 0.6 0.6 0.6 0.6 0.6 ... 0.6 0.6 0.6 0.6 t (chain, draw) float64 320B 0.025 0.025 0.025 ... 0.025 0.025 0.025 v (chain, draw) float64 320B 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 Attributes: created_at: 2025-07-13T13:48:07.647544+00:00 arviz_version: 0.21.0 inference_library: pymc inference_library_version: 5.21.1 sampling_time: 0.9140770435333252 tuning_steps: 1 modeling_interface: bambi modeling_interface_version: 0.15.0
-
<xarray.Dataset> Size: 1MB Dimensions: (chain: 4, draw: 10, __obs__: 3988) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 80B 0 1 2 3 4 5 6 7 8 9 * __obs__ (__obs__) int64 32kB 0 1 2 3 4 5 ... 3983 3984 3985 3986 3987 Data variables: rt,response (chain, draw, __obs__) float64 1MB -1.634 -1.914 ... -1.662 Attributes: modeling_interface: bambi modeling_interface_version: 0.15.0
-
<xarray.Dataset> Size: 5kB Dimensions: (chain: 4, draw: 10) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 80B 0 1 2 3 4 5 6 7 8 9 Data variables: (12/17) energy_error (chain, draw) float64 320B 0.0 0.0 0.0 ... 0.0 0.0 perf_counter_diff (chain, draw) float64 320B 0.122 0.0377 ... 0.04596 energy (chain, draw) float64 320B 7.481e+03 ... 7.483e+03 acceptance_rate (chain, draw) float64 320B 0.0 0.0 0.0 ... 0.0 0.0 largest_eigval (chain, draw) float64 320B nan nan nan ... nan nan index_in_trajectory (chain, draw) int64 320B 0 0 0 0 0 0 ... 0 0 0 0 0 0 ... ... tree_depth (chain, draw) int64 320B 1 1 1 1 1 1 ... 1 1 1 1 1 1 smallest_eigval (chain, draw) float64 320B nan nan nan ... nan nan reached_max_treedepth (chain, draw) bool 40B False False ... False False step_size_bar (chain, draw) float64 320B 0.4128 0.4128 ... 0.4128 diverging (chain, draw) bool 40B True True True ... True True step_size (chain, draw) float64 320B 0.4128 0.4128 ... 0.4128 Attributes: created_at: 2025-07-13T13:48:07.662161+00:00 arviz_version: 0.21.0 inference_library: pymc inference_library_version: 5.21.1 sampling_time: 0.9140770435333252 tuning_steps: 1 modeling_interface: bambi modeling_interface_version: 0.15.0
-
<xarray.Dataset> Size: 96kB Dimensions: (__obs__: 3988, rt,response_extra_dim_0: 2) Coordinates: * __obs__ (__obs__) int64 32kB 0 1 2 3 ... 3985 3986 3987 * rt,response_extra_dim_0 (rt,response_extra_dim_0) int64 16B 0 1 Data variables: rt,response (__obs__, rt,response_extra_dim_0) float64 64kB ... Attributes: created_at: 2025-07-13T13:48:07.666270+00:00 arviz_version: 0.21.0 inference_library: pymc inference_library_version: 5.21.1 modeling_interface: bambi modeling_interface_version: 0.15.0
While models can become a lot more complicated, you will be able to adjust initial values via this process consistently.
Link function¶
Setting initial values for parameters when working with link functions becomes a little more tricky. Below is an example.
Let's define a simple regression with logit
link functions.
model = hssm.HSSM(
data=cav_data,
model="ddm",
link_settings="log_logit",
loglik_kind="approx_differentiable",
include=[{"name": "a", "formula": "a ~ 1 + theta"}],
)
Model initialized successfully.
model.initvals
{'z': array(0.5), 't': array(0.025), 'v': array(0.), 'a_Intercept': array(0.), 'a_theta': array(0.)}
Well, a_Intercept
gets a default initial value of 0
?
a
is the boundary separation parameter in the drift diffusion model, a setting of 0
seems like an extremely bad choice... It would lead to a pointmass rt
at 0
s!
What's going on here? Let's expect our model...
model
Hierarchical Sequential Sampling Model Model: ddm Response variable: rt,response Likelihood: approx_differentiable Observations: 3988 Parameters: v: Prior: Uniform(lower: -3.0, upper: 3.0) Explicit bounds: (-3.0, 3.0) (ignored due to link function) a: Formula: a ~ 1 + theta Priors: a_Intercept ~ Normal(mu: 0.0, sigma: 0.25) a_theta ~ Normal(mu: 0.0, sigma: 0.25) Link: Generalized logit link function with bounds (0.3, 2.5) Explicit bounds: (0.3, 2.5) (ignored due to link function) z: Prior: Uniform(lower: 0.0, upper: 1.0) Explicit bounds: (0.0, 1.0) (ignored due to link function) t: Prior: HalfNormal(sigma: 2.0) Explicit bounds: (0.0, 2.0) (ignored due to link function) Lapse probability: 0.05 Lapse distribution: Uniform(lower: 0.0, upper: 20.0)
Ah, we apply the Generalized Logit link to the a
parameters.
The initial value of the actual parameter a
that the likelihood will receive, is the output of the transformation.
Our hssm
model class includes everything we need to inspect this further.
What would we expect here?
The generalized logit transformation (link), has the associated generalized sigmoid transformation (inverse link) as the forward transform.
We expect that, evaluating this transformation at 0
, should give back the mean value between the explicit bounds we set for the parameter.
So we expect: $(0.3 + 2.5) / 2 = 1.4$... let's check this.
Note:
The forward
link, (from parameter to function output that the likelihood receives), is call inverse link function in GLM lingo.
We follow this nomenclature in HSSM.
model.params["a"].link.linkinv(model._initvals["a_Intercept"])
1.4000000000000001
Voila! This checks out! We note that the linkinv()
function can come in handy if you want to play around with initial value setting in the context of using link functions yourself.
HSSM's initial value defaults logic¶
We try to guide the initial value settings in HSSM, with reasonble defaults that hopefully work in many cases without needed further adjustments. It is however difficulty to find settings that work blindly.
We follow the guidelines below:
- Avoid known issues near parameter boundaries(especially important for
approx_differentiable
likelihoods) --> whenever possible, initialize near the center of bounded parameter spaces - Starting values for the
t
parameter should be low to avoid known pathologies of theanalytic
DDM likelihood when the smallest reaction times (rt
) values come close tot
- In a regression setting, minimize the spread of
offset
parameters to avoid inadvertently running into parameter limits and initialize all but theIntercept
parameter to0
NOTE:
Especially guideline 3. is a very conservative setting, focused solely on avoiding boundary behavior. This will NOT always be a smart idea, and it may sometimes collaterally have a negative impact on convergence.
We encourage users to actively play with initial value setting at the moment.
Example¶
cav_data
participant_id | stim | rt | response | theta | dbs | conf | |
---|---|---|---|---|---|---|---|
0 | 0 | LL | 1.210 | 1.0 | 0.656275 | 1 | HC |
1 | 0 | WL | 1.630 | 1.0 | -0.327889 | 1 | LC |
2 | 0 | WW | 1.030 | 1.0 | -0.480285 | 1 | HC |
3 | 0 | WL | 2.770 | 1.0 | 1.927427 | 1 | LC |
4 | 0 | WW | 1.140 | -1.0 | -0.213236 | 1 | HC |
... | ... | ... | ... | ... | ... | ... | ... |
3983 | 13 | LL | 1.450 | -1.0 | -1.237166 | 0 | HC |
3984 | 13 | WL | 0.711 | 1.0 | -0.377450 | 0 | LC |
3985 | 13 | WL | 0.784 | 1.0 | -0.694194 | 0 | LC |
3986 | 13 | LL | 2.350 | -1.0 | -0.546536 | 0 | HC |
3987 | 13 | WW | 1.250 | 1.0 | 0.752388 | 0 | HC |
3988 rows × 7 columns
model_reg = hssm.HSSM(
data=cav_data,
model="ddm",
prior_settings="safe",
loglik_kind="approx_differentiable",
include=[{"name": "a", "formula": "a ~ 1 + theta + (1|participant_id)"}],
)
Model initialized successfully.
model_reg.initvals
{'z': array(0.5), 't': array(0.025), 'v': array(0.), 'a_Intercept': array(1.5), 'a_theta': array(0.), 'a_1|participant_id_sigma': array(0.27082359), 'a_1|participant_id_offset': array([-0.00037851, -0.00733303, -0.00145185, -0.0062298 , 0.00852909, 0.00403854, 0.00332884, -0.00735726, -0.00606049, 0.00591678, 0.00637818, -0.00133482, -0.00313939, -0.00643042])}
Let's discuss what we see here:
- We do NOT apply a link function in this case, so guideline 1 applies without further cognitive effort. By contrast to the previous example, check how
a_Intercept
is now directly initialized as1.5
, near the middle of the parameter range. - Covariate betas, in this case
a_theta
as initialized to0
- The
offset
parameters associated with individual parameters in group hierarchies are initialized cloe to0
t
is initialized close to0
as per guideline 2.- The remaining parameters are set close to or at the middle of the allowed parameters space (if bounded)
For parameters that we do not actively manipulate, PyMC (BAMBI) defaults are applied unchanged (e.g. here: a_1|participant_id_sigma
). These setting may sometimes be sub-optimal for applications in HSSM, hence we again caution the user to take an active approach towards investiging initial values in case of convergence issues.
Parameters¶
There are two keyword arguments (kwargs
) that we can set in the the base HSSM
class.
process_initvals: bool
turns processing of initial values on and offinitval_jitter: float
which applies an uniform jitter around vector values initial values
model_no_initval = hssm.HSSM(
data=cav_data,
model="ddm",
loglik_kind="approx_differentiable",
include=[{"name": "a", "formula": "a ~ 1 + theta + (1|participant_id)"}],
process_initvals=False,
)
Model initialized successfully.
model_no_initval.initvals
{'z': array(0.5), 't': array(2.), 'v': array(0.), 'a_Intercept': array(1.4), 'a_theta': array(0.), 'a_1|participant_id_sigma': array(0.27082359), 'a_1|participant_id_offset': array([ 2.42653745e-03, 3.65731912e-03, -4.10165032e-03, -1.00915984e-03, -4.91469027e-03, -8.85937922e-03, -1.81710875e-05, -1.28249545e-03, -2.10299902e-03, 5.82275214e-03, 2.88577774e-03, 5.36489440e-03, -2.40398804e-04, -7.74519145e-03])}
If you want to change the magnitude of the default jitter, you can manipulate it via the initval_jitter
argument.
model_jitter = hssm.HSSM(
data=cav_data,
model="ddm",
loglik_kind="approx_differentiable",
include=[{"name": "a",
"formula": "a ~ 1 + theta + (1|participant_id)"}],
process_initvals=False,
initval_jitter=0.5,
)
Model initialized successfully.
model_jitter.initvals
{'z': array(0.5), 't': array(2.), 'v': array(0.), 'a_Intercept': array(1.4), 'a_theta': array(0.), 'a_1|participant_id_sigma': array(0.27082359), 'a_1|participant_id_offset': array([ 0.45326236, -0.14712635, 0.14375621, -0.07680948, -0.26954079, 0.46372104, 0.31306332, -0.21974367, -0.02417173, -0.45023316, -0.44458553, -0.48113295, 0.12384263, 0.25271717])}