hssm.HSSM
Use hssm.HSSM
class to construct an HSSM model.
hssm.HSSM ¶
HSSM(
data: pd.DataFrame,
model: SupportedModels | str = "ddm",
choices: list[int] | None = None,
include: list[dict[str, Any] | Param] | None = None,
model_config: ModelConfig | dict | None = None,
loglik: (
str
| PathLike
| Callable
| pytensor.graph.Op
| type[pm.Distribution]
| None
) = None,
loglik_kind: LoglikKind | None = None,
p_outlier: float | dict | bmb.Prior | None = 0.05,
lapse: dict | bmb.Prior | None = bmb.Prior(
"Uniform", lower=0.0, upper=20.0
),
global_formula: str | None = None,
link_settings: Literal["log_logit"] | None = None,
prior_settings: Literal["safe"] | None = "safe",
extra_namespace: dict[str, Any] | None = None,
missing_data: bool | float = False,
deadline: bool | str = False,
loglik_missing_data: (
str | PathLike | Callable | pytensor.graph.Op | None
) = None,
process_initvals: bool = True,
initval_jitter: float = INITVAL_JITTER_SETTINGS["jitter_epsilon"],
**kwargs
)
The basic Hierarchical Sequential Sampling Model (HSSM) class.
Parameters:
-
data
(DataFrame
) –A pandas DataFrame with the minimum requirements of containing the data with the columns "rt" and "response".
-
model
(SupportedModels | str
, default:'ddm'
) –The name of the model to use. Currently supported models are "ddm", "ddm_sdv", "full_ddm", "angle", "levy", "ornstein", "weibull", "race_no_bias_angle_4", "ddm_seq2_no_bias". If any other string is passed, the model will be considered custom, in which case all
model_config
,loglik
, andloglik_kind
have to be provided by the user. -
choices
(optional
, default:None
) –When an
int
, the number of choices that the participants can make. If2
, the choices are [-1, 1] by default. If anything greater than2
, the choices are [0, 1, ..., n_choices - 1] by default. If alist
is provided, it should be the list of choices that the participants can make. Defaults to2
. If any value other than the choices provided is found in the "response" column of the data, an error will be raised. -
include
(optional
, default:None
) –A list of dictionaries specifying parameter specifications to include in the model. If left unspecified, defaults will be used for all parameter specifications. Defaults to None.
-
model_config
(optional
, default:None
) –A dictionary containing the model configuration information. If None is provided, defaults will be used if there are any. Defaults to None. Fields for this
dict
are usually:"list_params"
: a list of parameters indicating the parameters of the model. The order in which the parameters are specified in this list is important. Values for each parameter will be passed to the likelihood function in this order."backend"
: Only used whenloglik_kind
isapprox_differentiable
and an onnx file is supplied for the likelihood approximation network (LAN). Valid values are"jax"
or"pytensor"
. It determines whether the LAN in ONNX should be converted to"jax"
or"pytensor"
. If not provided,jax
will be used for maximum performance."default_priors"
: Adict
indicating the default priors for each parameter."bounds"
: Adict
indicating the boundaries for each parameter. In the case of LAN, these bounds are training boundaries."rv"
: Optional. Can be aRandomVariable
class containing the user's ownrng_fn
function for sampling from the distribution that the user is supplying. If not supplied, HSSM will automatically generate aRandomVariable
using the simulator identified bymodel
from thessm_simulators
package. Ifmodel
is not supported inssm_simulators
, a warning will be raised letting the user know that sampling from theRandomVariable
will result in errors."extra_fields"
: Optional. A list of strings indicating the additional columns indata
that will be passed to the likelihood function for calculation. This is helpful if the likelihood function depends on data other than the observed data and the parameter values.
-
loglik
(optional
, default:None
) –A likelihood function. Defaults to None. Requirements are:
- if
loglik_kind
is"analytical"
or"blackbox"
, a pm.Distribution, a pytensor Op, or a Python callable can be used. Signatures are:pm.Distribution
: needs to have parameters specified exactly as listed inlist_params
pytensor.graph.Op
andCallable
: needs to accept the parameters specified exactly as listed inlist_params
- If
loglik_kind
is"approx_differentiable"
, then in addition to the specifications above, astr
orPathlike
can also be used to specify a path to anonnx
file. If astr
is provided, HSSM will first look locally for anonnx
file. If that is not successful, HSSM will try to download thatonnx
file from Hugging Face hub. - It can also be
None
, in which case a default likelihood function will be used
- if
-
loglik_kind
(optional
, default:None
) –A string that specifies the kind of log-likelihood function specified with
loglik
. Defaults toNone
. Can be one of the following:"analytical"
: an analytical (approximation) likelihood function. It is differentiable and can be used with samplers that requires differentiation."approx_differentiable"
: a likelihood approximation network (LAN) likelihood function. It is differentiable and can be used with samplers that requires differentiation."blackbox"
: a black box likelihood function. It is typically NOT differentiable.None
, in which a default will be used. Forddm
type of models, the default will beanalytical
. For other models supported, it will beapprox_differentiable
. If the model is a custom one, a ValueError will be raised.
-
p_outlier
(optional
, default:0.05
) –The fixed lapse probability or the prior distribution of the lapse probability. Defaults to a fixed value of 0.05. When
None
, the lapse probability will not be included in estimation. -
lapse
(optional
, default:Prior('Uniform', lower=0.0, upper=20.0)
) –The lapse distribution. This argument is required only if
p_outlier
is notNone
. Defaults to Uniform(0.0, 10.0). -
global_formula
(optional
, default:None
) –A string that specifies a regressions formula which will be used for all model parameters. If you specify parameter-wise regressions in addition, these will override the global regression for the respective parameter.
-
link_settings
(optional
, default:None
) –An optional string literal that indicates the link functions to use for each parameter. Helpful for hierarchical models where sampling might get stuck/ very slow. Can be one of the following:
"log_logit"
: applies log link functions to positive parameters and generalized logit link functions to parameters that have explicit bounds.None
: unless otherwise specified, the"identity"
link functions will be used. The default value isNone
.
-
prior_settings
(optional
, default:'safe'
) –An optional string literal that indicates the prior distributions to use for each parameter. Helpful for hierarchical models where sampling might get stuck/ very slow. Can be one of the following:
"safe"
: HSSM will scan all parameters in the model and apply safe priors to all parameters that do not have explicit bounds.- None: HSSM will use bambi to provide default priors for all parameters. Not
recommended when you are using hierarchical models.
The default value is
"safe"
.
-
extra_namespace
(optional
, default:None
) –Additional user supplied variables with transformations or data to include in the environment where the formula is evaluated. Defaults to
None
. -
missing_data
(optional
, default:False
) –Specifies whether the model should handle missing data. Can be a
bool
or afloat
. IfFalse
, and if thert
column contains in the data -999.0, the model will drop these rows and produce a warning. IfTrue
, the model will treat code -999.0 as missing data. If afloat
is provided, the model will treat this value as the missing data value. Defaults toFalse
. -
deadline
(optional
, default:False
) –Specifies whether the model should handle deadline data. Can be a
bool
or astr
. IfFalse
, the model will not do nothing even if a deadline column is provided. IfTrue
, the model will treat thedeadline
column as deadline data. If astr
is provided, the model will treat this value as the name of the deadline column. Defaults toFalse
. -
loglik_missing_data
(optional
, default:None
) –A likelihood function for missing data. Please see the
loglik
parameter to see how to specify the likelihood function this parameter. If nothing is provided, a default likelihood function will be used. This parameter is required only if eithermissing_data
ordeadline
is notFalse
. Defaults toNone
. -
process_initvals
(optional
, default:True
) –If
True
, the model will process the initial values. Defaults toTrue
. -
initval_jitter
(optional
, default:INITVAL_JITTER_SETTINGS['jitter_epsilon']
) –The jitter value for the initial values. Defaults to
0.01
. -
**kwargs
–Additional arguments passed to the
bmb.Model
object.
Methods:
-
sample
–Perform sampling using the
fit
method via bambi.Model. -
sample_posterior_predictive
–Perform posterior predictive sampling from the HSSM model.
-
sample_prior_predictive
–Generate samples from the prior predictive distribution.
-
vi
–Perform Variational Inference.
-
find_MAP
–Perform Maximum A Posteriori estimation.
-
log_likelihood
–Compute the log likelihood of the model.
-
summary
–Produce a summary table with ArviZ but with additional convenience features.
-
plot_trace
–Generate trace plot with ArviZ but with additional convenience features.
-
graph
–Produce a graphviz Digraph from a built HSSM model.
-
plot_posterior_predictive
–Produce a posterior predictive plot.
-
plot_quantile_probability
–Produce a quantile probability plot.
-
restore_traces
–Restore traces from an InferenceData object or a .netcdf file.
-
initial_point
–Compute the initial point of the model.
Attributes:
-
traces
(InferenceData | Approximation
) –Return the trace of the model after sampling.
-
pymc_model
(Model
) –Provide access to the PyMC model.
hssm.HSSM.traces
property
¶
Return the trace of the model after sampling.
Raises:
-
ValueError
–If the model has not been sampled yet.
Returns:
-
InferenceData | Approximation
–The trace of the model after the last call to
sample()
.
hssm.HSSM.pymc_model
property
¶
Provide access to the PyMC model.
Returns:
-
Model
–The PyMC model built by bambi
hssm.HSSM.sample ¶
sample(
sampler: (
Literal["mcmc", "nuts_numpyro", "nuts_blackjax", "laplace", "vi"] | None
) = None,
init: str | None = None,
initvals: str | dict | None = None,
include_response_params: bool = False,
**kwargs
) -> az.InferenceData | pm.Approximation
Perform sampling using the fit
method via bambi.Model.
Parameters:
-
sampler
(Literal['mcmc', 'nuts_numpyro', 'nuts_blackjax', 'laplace', 'vi'] | None
, default:None
) –The sampler to use. Can be one of "mcmc", "nuts_numpyro", "nuts_blackjax", "laplace", or "vi". If using
blackbox
likelihoods, this cannot be "nuts_numpyro" or "nuts_blackjax". By default it is None, and sampler will automatically be chosen: when the model uses theapprox_differentiable
likelihood, andjax
backend, "nuts_numpyro" will be used. Otherwise, "mcmc" (the default PyMC NUTS sampler) will be used. -
init
(str | None
, default:None
) –Initialization method to use for the sampler. If any of the NUTS samplers is used, defaults to
"adapt_diag"
. Otherwise, defaults to"auto"
. -
initvals
(str | dict | None
, default:None
) –Pass initial values to the sampler. This can be a dictionary of initial values for parameters of the model, or a string "map" to use initialization at the MAP estimate. If "map" is used, the MAP estimate will be computed if not already attached to the base class from prior call to 'find_MAP`.
-
include_response_params
(bool
, default:False
) –Include parameters of the response distribution in the output. These usually take more space than other parameters as there's one of them per observation. Defaults to False.
-
kwargs
–Other arguments passed to bmb.Model.fit(). Please see [here] (https://bambinos.github.io/bambi/api_reference.html#bambi.models.Model.fit) for full documentation.
Returns:
-
InferenceData | Approximation
–A reference to the
model.traces
object, which stores the traces of the last call tomodel.sample()
.model.traces
is an ArviZInferenceData
instance ifsampler
is"mcmc"
(default),"nuts_numpyro"
,"nuts_blackjax"
or "laplace"
, or anApproximation
object if"vi"
.
hssm.HSSM.sample_posterior_predictive ¶
sample_posterior_predictive(
idata: az.InferenceData | None = None,
data: pd.DataFrame | None = None,
inplace: bool = True,
include_group_specific: bool = True,
kind: Literal["response", "response_params"] = "response",
draws: int | float | list[int] | np.ndarray | None = None,
safe_mode: bool = True,
) -> az.InferenceData | None
Perform posterior predictive sampling from the HSSM model.
Parameters:
-
idata
(optional
, default:None
) –The
InferenceData
object returned byHSSM.sample()
. If not provided, theInferenceData
from the last timesample()
is called will be used. -
data
(optional
, default:None
) –An optional data frame with values for the predictors that are used to obtain out-of-sample predictions. If omitted, the original dataset is used.
-
inplace
(optional
, default:True
) –If
True
will modify idata in-place and append aposterior_predictive
group toidata
. Otherwise, it will return a copy of idata with the predictions added, by default True. -
include_group_specific
(optional
, default:True
) –If
True
will make predictions including the group specific effects. Otherwise, predictions are made with common effects only (i.e. group- specific are set to zero), by default True. -
kind
(Literal['response', 'response_params']
, default:'response'
) –Indicates the type of prediction required. Can be
"response_params"
or"response"
. The first returns draws from the posterior distribution of the likelihood parameters, while the latter returns the draws from the posterior predictive distribution (i.e. the posterior probability distribution for a new observation) in addition to the posterior distribution. Defaults to "response_params". -
draws
(int | float | list[int] | ndarray | None
, default:None
) –The number of samples to draw from the posterior predictive distribution from each chain. When it's an integer >= 1, the number of samples to be extracted from the
draw
dimension. If this integer is larger than the number of posterior samples in each chain, all posterior samples will be used in posterior predictive sampling. When a float between 0 and 1, the proportion of samples from the draw dimension from each chain to be used in posterior predictive sampling.. If this proportion is very small, at least one sample will be used. When None, all posterior samples will be used. Defaults to None. -
safe_mode
(bool
, default:True
) –If True, the function will split the draws into chunks of 10 to avoid memory issues. Defaults to True.
Raises:
-
ValueError
–If the model has not been sampled yet and idata is not provided.
Returns:
-
InferenceData | None
–InferenceData or None
hssm.HSSM.sample_prior_predictive ¶
sample_prior_predictive(
draws: int = 500,
var_names: str | list[str] | None = None,
omit_offsets: bool = True,
random_seed: np.random.Generator | None = None,
) -> az.InferenceData
Generate samples from the prior predictive distribution.
Parameters:
-
draws
(int
, default:500
) –Number of draws to sample from the prior predictive distribution. Defaults to 500.
-
var_names
(str | list[str] | None
, default:None
) –A list of names of variables for which to compute the prior predictive distribution. Defaults to
None
which means both observed and unobserved RVs. -
omit_offsets
(bool
, default:True
) –Whether to omit offset terms. Defaults to
True
. -
random_seed
(Generator | None
, default:None
) –Seed for the random number generator.
Returns:
-
InferenceData
–InferenceData
object with the groupsprior
,prior_predictive
andobserved_data
.
hssm.HSSM.vi ¶
vi(
method: str = "advi",
niter: int = 10000,
draws: int = 1000,
return_idata: bool = True,
ignore_mcmc_start_point_defaults=False,
**vi_kwargs
) -> pm.Approximation | az.InferenceData
Perform Variational Inference.
Parameters:
-
niter
(int
, default:10000
) –The number of iterations to run the VI algorithm. Defaults to 3000.
-
method
(str
, default:'advi'
) –The method to use for VI. Can be one of "advi" or "fullrank_advi", "svgd", "asvgd".Defaults to "advi".
-
draws
(int
, default:1000
) –The number of samples to draw from the posterior distribution. Defaults to 1000.
-
return_idata
(bool
, default:True
) –If True, returns an InferenceData object. Otherwise, returns the approximation object directly. Defaults to True.
Returns:
-
pm.Approximation or az.InferenceData: The mean field approximation object.
–
hssm.HSSM.find_MAP ¶
Perform Maximum A Posteriori estimation.
Returns:
-
dict
–A dictionary containing the MAP estimates of the model parameters.
hssm.HSSM.log_likelihood ¶
log_likelihood(
idata: az.InferenceData | None = None,
data: pd.DataFrame | None = None,
inplace: bool = True,
keep_likelihood_params: bool = False,
) -> az.InferenceData | None
Compute the log likelihood of the model.
Parameters:
-
idata
(optional
, default:None
) –The
InferenceData
object returned byHSSM.sample()
. If not provided, -
data
(optional
, default:None
) –A pandas DataFrame with values for the predictors that are used to obtain out-of-sample predictions. If omitted, the original dataset is used.
-
inplace
(optional
, default:True
) –If
True
will modify idata in-place and append alog_likelihood
group toidata
. Otherwise, it will return a copy of idata with the predictions added, by default True. -
keep_likelihood_params
(optional
, default:False
) –If
True
, the trial wise likelihood parameters that are computed on route to getting the log likelihood are kept in theidata
object. Defaults to False. See also the methodadd_likelihood_parameters_to_idata
.
Returns:
-
InferenceData | None
–InferenceData or None
hssm.HSSM.summary ¶
summary(
data: az.InferenceData | None = None,
include_deterministic: bool = False,
**kwargs
) -> pd.DataFrame | xr.Dataset
Produce a summary table with ArviZ but with additional convenience features.
This is a simple wrapper for the az.summary() function. By default, it filters out the deterministic values from the plot. Please see the [arviz documentation] (https://arviz-devs.github.io/arviz/api/generated/arviz.summary.html) for additional parameters that can be specified.
Parameters:
-
data
(InferenceData | None
, default:None
) –An ArviZ InferenceData object. If None, the traces stored in the model will be used.
-
include_deterministic
(optional
, default:False
) –Whether to include deterministic variables in the plot. Defaults to False. Note that if include_deterministic is set to False and and
var_names
is provided, thevar_names
provided will be modified to also exclude the deterministic values. If this is not desirable, setinclude_deterministic
to True.
Returns:
-
DataFrame | Dataset
–A pandas DataFrame or xarray Dataset containing the summary statistics.
hssm.HSSM.plot_trace ¶
plot_trace(
data: az.InferenceData | None = None,
include_deterministic: bool = False,
tight_layout: bool = True,
**kwargs
) -> None
Generate trace plot with ArviZ but with additional convenience features.
This is a simple wrapper for the az.plot_trace() function. By default, it filters out the deterministic values from the plot. Please see the [arviz documentation] (https://arviz-devs.github.io/arviz/api/generated/arviz.plot_trace.html) for additional parameters that can be specified.
Parameters:
-
data
(optional
, default:None
) –An ArviZ InferenceData object. If None, the traces stored in the model will be used.
-
include_deterministic
(optional
, default:False
) –Whether to include deterministic variables in the plot. Defaults to False. Note that if include deterministic is set to False and and
var_names
is provided, thevar_names
provided will be modified to also exclude the deterministic values. If this is not desirable, setinclude deterministic
to True. -
tight_layout
(optional
, default:True
) –Whether to call plt.tight_layout() after plotting. Defaults to True.
hssm.HSSM.graph ¶
Produce a graphviz Digraph from a built HSSM model.
Requires graphviz, which may be installed most easily with conda install -c
conda-forge python-graphviz
. Alternatively, you may install the graphviz
binaries yourself, and then pip install graphviz
to get the python bindings.
See http://graphviz.readthedocs.io/en/stable/manual.html for more information.
Parameters:
-
formatting
–One of
"plain"
or"plain_with_params"
. Defaults to"plain"
. -
name
–Name of the figure to save. Defaults to
None
, no figure is saved. -
figsize
–Maximum width and height of figure in inches. Defaults to
None
, the figure size is set automatically. If defined and the drawing is larger than the given size, the drawing is uniformly scaled down so that it fits within the given size. Only works ifname
is notNone
. -
dpi
–Point per inch of the figure to save. Defaults to 300. Only works if
name
is notNone
. -
fmt
–Format of the figure to save. Defaults to
"png"
. Only works ifname
is notNone
.
Returns:
-
Graph
–The graph
hssm.HSSM.plot_posterior_predictive ¶
Produce a posterior predictive plot.
Equivalent to calling hssm.plotting.plot_posterior_predictive()
with the
model. Please see that function for
full documentation.
Returns:
-
Axes | FacetGrid
–The matplotlib axis or seaborn FacetGrid object containing the plot.
hssm.HSSM.plot_quantile_probability ¶
Produce a quantile probability plot.
Equivalent to calling hssm.plotting.plot_quantile_probability()
with the
model. Please see that function for
full documentation.
Returns:
-
Axes | FacetGrid
–The matplotlib axis or seaborn FacetGrid object containing the plot.
hssm.HSSM.restore_traces ¶
hssm.HSSM.initial_point ¶
Compute the initial point of the model.
This is a slightly altered version of pm.initial_point.initial_point().
Parameters:
-
transformed
(bool
, default:False
) –If True, return the initial point in transformed space.
Returns:
-
dict
–A dictionary containing the initial point of the model parameters.