Skip to content

hssm.HSSM

Use hssm.HSSM class to construct an HSSM model.

hssm.HSSM

HSSM(
    data: pd.DataFrame,
    model: SupportedModels | str = "ddm",
    choices: list[int] | None = None,
    include: list[dict[str, Any] | Param] | None = None,
    model_config: ModelConfig | dict | None = None,
    loglik: (
        str
        | PathLike
        | Callable
        | pytensor.graph.Op
        | type[pm.Distribution]
        | None
    ) = None,
    loglik_kind: LoglikKind | None = None,
    p_outlier: float | dict | bmb.Prior | None = 0.05,
    lapse: dict | bmb.Prior | None = bmb.Prior(
        "Uniform", lower=0.0, upper=20.0
    ),
    global_formula: str | None = None,
    link_settings: Literal["log_logit"] | None = None,
    prior_settings: Literal["safe"] | None = "safe",
    extra_namespace: dict[str, Any] | None = None,
    missing_data: bool | float = False,
    deadline: bool | str = False,
    loglik_missing_data: (
        str | PathLike | Callable | pytensor.graph.Op | None
    ) = None,
    process_initvals: bool = True,
    initval_jitter: float = INITVAL_JITTER_SETTINGS["jitter_epsilon"],
    **kwargs
)

The basic Hierarchical Sequential Sampling Model (HSSM) class.

Parameters:

  • data (DataFrame) –

    A pandas DataFrame with the minimum requirements of containing the data with the columns "rt" and "response".

  • model (SupportedModels | str, default: 'ddm' ) –

    The name of the model to use. Currently supported models are "ddm", "ddm_sdv", "full_ddm", "angle", "levy", "ornstein", "weibull", "race_no_bias_angle_4", "ddm_seq2_no_bias". If any other string is passed, the model will be considered custom, in which case all model_config, loglik, and loglik_kind have to be provided by the user.

  • choices (optional, default: None ) –

    When an int, the number of choices that the participants can make. If 2, the choices are [-1, 1] by default. If anything greater than 2, the choices are [0, 1, ..., n_choices - 1] by default. If a list is provided, it should be the list of choices that the participants can make. Defaults to 2. If any value other than the choices provided is found in the "response" column of the data, an error will be raised.

  • include (optional, default: None ) –

    A list of dictionaries specifying parameter specifications to include in the model. If left unspecified, defaults will be used for all parameter specifications. Defaults to None.

  • model_config (optional, default: None ) –

    A dictionary containing the model configuration information. If None is provided, defaults will be used if there are any. Defaults to None. Fields for this dict are usually:

    • "list_params": a list of parameters indicating the parameters of the model. The order in which the parameters are specified in this list is important. Values for each parameter will be passed to the likelihood function in this order.
    • "backend": Only used when loglik_kind is approx_differentiable and an onnx file is supplied for the likelihood approximation network (LAN). Valid values are "jax" or "pytensor". It determines whether the LAN in ONNX should be converted to "jax" or "pytensor". If not provided, jax will be used for maximum performance.
    • "default_priors": A dict indicating the default priors for each parameter.
    • "bounds": A dict indicating the boundaries for each parameter. In the case of LAN, these bounds are training boundaries.
    • "rv": Optional. Can be a RandomVariable class containing the user's own rng_fn function for sampling from the distribution that the user is supplying. If not supplied, HSSM will automatically generate a RandomVariable using the simulator identified by model from the ssm_simulators package. If model is not supported in ssm_simulators, a warning will be raised letting the user know that sampling from the RandomVariable will result in errors.
    • "extra_fields": Optional. A list of strings indicating the additional columns in data that will be passed to the likelihood function for calculation. This is helpful if the likelihood function depends on data other than the observed data and the parameter values.
  • loglik (optional, default: None ) –

    A likelihood function. Defaults to None. Requirements are:

    1. if loglik_kind is "analytical" or "blackbox", a pm.Distribution, a pytensor Op, or a Python callable can be used. Signatures are:
      • pm.Distribution: needs to have parameters specified exactly as listed in list_params
      • pytensor.graph.Op and Callable: needs to accept the parameters specified exactly as listed in list_params
    2. If loglik_kind is "approx_differentiable", then in addition to the specifications above, a str or Pathlike can also be used to specify a path to an onnx file. If a str is provided, HSSM will first look locally for an onnx file. If that is not successful, HSSM will try to download that onnx file from Hugging Face hub.
    3. It can also be None, in which case a default likelihood function will be used
  • loglik_kind (optional, default: None ) –

    A string that specifies the kind of log-likelihood function specified with loglik. Defaults to None. Can be one of the following:

    • "analytical": an analytical (approximation) likelihood function. It is differentiable and can be used with samplers that requires differentiation.
    • "approx_differentiable": a likelihood approximation network (LAN) likelihood function. It is differentiable and can be used with samplers that requires differentiation.
    • "blackbox": a black box likelihood function. It is typically NOT differentiable.
    • None, in which a default will be used. For ddm type of models, the default will be analytical. For other models supported, it will be approx_differentiable. If the model is a custom one, a ValueError will be raised.
  • p_outlier (optional, default: 0.05 ) –

    The fixed lapse probability or the prior distribution of the lapse probability. Defaults to a fixed value of 0.05. When None, the lapse probability will not be included in estimation.

  • lapse (optional, default: Prior('Uniform', lower=0.0, upper=20.0) ) –

    The lapse distribution. This argument is required only if p_outlier is not None. Defaults to Uniform(0.0, 10.0).

  • global_formula (optional, default: None ) –

    A string that specifies a regressions formula which will be used for all model parameters. If you specify parameter-wise regressions in addition, these will override the global regression for the respective parameter.

  • link_settings (optional, default: None ) –

    An optional string literal that indicates the link functions to use for each parameter. Helpful for hierarchical models where sampling might get stuck/ very slow. Can be one of the following:

    • "log_logit": applies log link functions to positive parameters and generalized logit link functions to parameters that have explicit bounds.
    • None: unless otherwise specified, the "identity" link functions will be used. The default value is None.
  • prior_settings (optional, default: 'safe' ) –

    An optional string literal that indicates the prior distributions to use for each parameter. Helpful for hierarchical models where sampling might get stuck/ very slow. Can be one of the following:

    • "safe": HSSM will scan all parameters in the model and apply safe priors to all parameters that do not have explicit bounds.
    • None: HSSM will use bambi to provide default priors for all parameters. Not recommended when you are using hierarchical models. The default value is "safe".
  • extra_namespace (optional, default: None ) –

    Additional user supplied variables with transformations or data to include in the environment where the formula is evaluated. Defaults to None.

  • missing_data (optional, default: False ) –

    Specifies whether the model should handle missing data. Can be a bool or a float. If False, and if the rt column contains in the data -999.0, the model will drop these rows and produce a warning. If True, the model will treat code -999.0 as missing data. If a float is provided, the model will treat this value as the missing data value. Defaults to False.

  • deadline (optional, default: False ) –

    Specifies whether the model should handle deadline data. Can be a bool or a str. If False, the model will not do nothing even if a deadline column is provided. If True, the model will treat the deadline column as deadline data. If a str is provided, the model will treat this value as the name of the deadline column. Defaults to False.

  • loglik_missing_data (optional, default: None ) –

    A likelihood function for missing data. Please see the loglik parameter to see how to specify the likelihood function this parameter. If nothing is provided, a default likelihood function will be used. This parameter is required only if either missing_data or deadline is not False. Defaults to None.

  • process_initvals (optional, default: True ) –

    If True, the model will process the initial values. Defaults to True.

  • initval_jitter (optional, default: INITVAL_JITTER_SETTINGS['jitter_epsilon'] ) –

    The jitter value for the initial values. Defaults to 0.01.

  • **kwargs

    Additional arguments passed to the bmb.Model object.

Methods:

  • sample

    Perform sampling using the fit method via bambi.Model.

  • sample_posterior_predictive

    Perform posterior predictive sampling from the HSSM model.

  • sample_prior_predictive

    Generate samples from the prior predictive distribution.

  • vi

    Perform Variational Inference.

  • find_MAP

    Perform Maximum A Posteriori estimation.

  • log_likelihood

    Compute the log likelihood of the model.

  • summary

    Produce a summary table with ArviZ but with additional convenience features.

  • plot_trace

    Generate trace plot with ArviZ but with additional convenience features.

  • graph

    Produce a graphviz Digraph from a built HSSM model.

  • plot_posterior_predictive

    Produce a posterior predictive plot.

  • plot_quantile_probability

    Produce a quantile probability plot.

  • restore_traces

    Restore traces from an InferenceData object or a .netcdf file.

  • initial_point

    Compute the initial point of the model.

Attributes:

  • traces (InferenceData | Approximation) –

    Return the trace of the model after sampling.

  • pymc_model (Model) –

    Provide access to the PyMC model.

hssm.HSSM.traces property

traces: InferenceData | Approximation

Return the trace of the model after sampling.

Raises:

  • ValueError

    If the model has not been sampled yet.

Returns:

  • InferenceData | Approximation

    The trace of the model after the last call to sample().

hssm.HSSM.pymc_model property

pymc_model: Model

Provide access to the PyMC model.

Returns:

  • Model

    The PyMC model built by bambi

hssm.HSSM.sample

sample(
    sampler: (
        Literal["mcmc", "nuts_numpyro", "nuts_blackjax", "laplace", "vi"] | None
    ) = None,
    init: str | None = None,
    initvals: str | dict | None = None,
    include_response_params: bool = False,
    **kwargs
) -> az.InferenceData | pm.Approximation

Perform sampling using the fit method via bambi.Model.

Parameters:

  • sampler (Literal['mcmc', 'nuts_numpyro', 'nuts_blackjax', 'laplace', 'vi'] | None, default: None ) –

    The sampler to use. Can be one of "mcmc", "nuts_numpyro", "nuts_blackjax", "laplace", or "vi". If using blackbox likelihoods, this cannot be "nuts_numpyro" or "nuts_blackjax". By default it is None, and sampler will automatically be chosen: when the model uses the approx_differentiable likelihood, and jax backend, "nuts_numpyro" will be used. Otherwise, "mcmc" (the default PyMC NUTS sampler) will be used.

  • init (str | None, default: None ) –

    Initialization method to use for the sampler. If any of the NUTS samplers is used, defaults to "adapt_diag". Otherwise, defaults to "auto".

  • initvals (str | dict | None, default: None ) –

    Pass initial values to the sampler. This can be a dictionary of initial values for parameters of the model, or a string "map" to use initialization at the MAP estimate. If "map" is used, the MAP estimate will be computed if not already attached to the base class from prior call to 'find_MAP`.

  • include_response_params (bool, default: False ) –

    Include parameters of the response distribution in the output. These usually take more space than other parameters as there's one of them per observation. Defaults to False.

  • kwargs

    Other arguments passed to bmb.Model.fit(). Please see [here] (https://bambinos.github.io/bambi/api_reference.html#bambi.models.Model.fit) for full documentation.

Returns:

  • InferenceData | Approximation

    A reference to the model.traces object, which stores the traces of the last call to model.sample(). model.traces is an ArviZ InferenceData instance if sampler is "mcmc" (default), "nuts_numpyro", "nuts_blackjax" or "laplace", or an Approximation object if "vi".

hssm.HSSM.sample_posterior_predictive

sample_posterior_predictive(
    idata: az.InferenceData | None = None,
    data: pd.DataFrame | None = None,
    inplace: bool = True,
    include_group_specific: bool = True,
    kind: Literal["response", "response_params"] = "response",
    draws: int | float | list[int] | np.ndarray | None = None,
    safe_mode: bool = True,
) -> az.InferenceData | None

Perform posterior predictive sampling from the HSSM model.

Parameters:

  • idata (optional, default: None ) –

    The InferenceData object returned by HSSM.sample(). If not provided, the InferenceData from the last time sample() is called will be used.

  • data (optional, default: None ) –

    An optional data frame with values for the predictors that are used to obtain out-of-sample predictions. If omitted, the original dataset is used.

  • inplace (optional, default: True ) –

    If True will modify idata in-place and append a posterior_predictive group to idata. Otherwise, it will return a copy of idata with the predictions added, by default True.

  • include_group_specific (optional, default: True ) –

    If True will make predictions including the group specific effects. Otherwise, predictions are made with common effects only (i.e. group- specific are set to zero), by default True.

  • kind (Literal['response', 'response_params'], default: 'response' ) –

    Indicates the type of prediction required. Can be "response_params" or "response". The first returns draws from the posterior distribution of the likelihood parameters, while the latter returns the draws from the posterior predictive distribution (i.e. the posterior probability distribution for a new observation) in addition to the posterior distribution. Defaults to "response_params".

  • draws (int | float | list[int] | ndarray | None, default: None ) –

    The number of samples to draw from the posterior predictive distribution from each chain. When it's an integer >= 1, the number of samples to be extracted from the draw dimension. If this integer is larger than the number of posterior samples in each chain, all posterior samples will be used in posterior predictive sampling. When a float between 0 and 1, the proportion of samples from the draw dimension from each chain to be used in posterior predictive sampling.. If this proportion is very small, at least one sample will be used. When None, all posterior samples will be used. Defaults to None.

  • safe_mode (bool, default: True ) –

    If True, the function will split the draws into chunks of 10 to avoid memory issues. Defaults to True.

Raises:

  • ValueError

    If the model has not been sampled yet and idata is not provided.

Returns:

  • InferenceData | None

    InferenceData or None

hssm.HSSM.sample_prior_predictive

sample_prior_predictive(
    draws: int = 500,
    var_names: str | list[str] | None = None,
    omit_offsets: bool = True,
    random_seed: np.random.Generator | None = None,
) -> az.InferenceData

Generate samples from the prior predictive distribution.

Parameters:

  • draws (int, default: 500 ) –

    Number of draws to sample from the prior predictive distribution. Defaults to 500.

  • var_names (str | list[str] | None, default: None ) –

    A list of names of variables for which to compute the prior predictive distribution. Defaults to None which means both observed and unobserved RVs.

  • omit_offsets (bool, default: True ) –

    Whether to omit offset terms. Defaults to True.

  • random_seed (Generator | None, default: None ) –

    Seed for the random number generator.

Returns:

  • InferenceData

    InferenceData object with the groups prior, prior_predictive and observed_data.

hssm.HSSM.vi

vi(
    method: str = "advi",
    niter: int = 10000,
    draws: int = 1000,
    return_idata: bool = True,
    ignore_mcmc_start_point_defaults=False,
    **vi_kwargs
) -> pm.Approximation | az.InferenceData

Perform Variational Inference.

Parameters:

  • niter (int, default: 10000 ) –

    The number of iterations to run the VI algorithm. Defaults to 3000.

  • method (str, default: 'advi' ) –

    The method to use for VI. Can be one of "advi" or "fullrank_advi", "svgd", "asvgd".Defaults to "advi".

  • draws (int, default: 1000 ) –

    The number of samples to draw from the posterior distribution. Defaults to 1000.

  • return_idata (bool, default: True ) –

    If True, returns an InferenceData object. Otherwise, returns the approximation object directly. Defaults to True.

Returns:

  • pm.Approximation or az.InferenceData: The mean field approximation object.

hssm.HSSM.find_MAP

find_MAP(**kwargs)

Perform Maximum A Posteriori estimation.

Returns:

  • dict

    A dictionary containing the MAP estimates of the model parameters.

hssm.HSSM.log_likelihood

log_likelihood(
    idata: az.InferenceData | None = None,
    data: pd.DataFrame | None = None,
    inplace: bool = True,
    keep_likelihood_params: bool = False,
) -> az.InferenceData | None

Compute the log likelihood of the model.

Parameters:

  • idata (optional, default: None ) –

    The InferenceData object returned by HSSM.sample(). If not provided,

  • data (optional, default: None ) –

    A pandas DataFrame with values for the predictors that are used to obtain out-of-sample predictions. If omitted, the original dataset is used.

  • inplace (optional, default: True ) –

    If True will modify idata in-place and append a log_likelihood group to idata. Otherwise, it will return a copy of idata with the predictions added, by default True.

  • keep_likelihood_params (optional, default: False ) –

    If True, the trial wise likelihood parameters that are computed on route to getting the log likelihood are kept in the idata object. Defaults to False. See also the method add_likelihood_parameters_to_idata.

Returns:

  • InferenceData | None

    InferenceData or None

hssm.HSSM.summary

summary(
    data: az.InferenceData | None = None,
    include_deterministic: bool = False,
    **kwargs
) -> pd.DataFrame | xr.Dataset

Produce a summary table with ArviZ but with additional convenience features.

This is a simple wrapper for the az.summary() function. By default, it filters out the deterministic values from the plot. Please see the [arviz documentation] (https://arviz-devs.github.io/arviz/api/generated/arviz.summary.html) for additional parameters that can be specified.

Parameters:

  • data (InferenceData | None, default: None ) –

    An ArviZ InferenceData object. If None, the traces stored in the model will be used.

  • include_deterministic (optional, default: False ) –

    Whether to include deterministic variables in the plot. Defaults to False. Note that if include_deterministic is set to False and and var_names is provided, the var_names provided will be modified to also exclude the deterministic values. If this is not desirable, set include_deterministic to True.

Returns:

  • DataFrame | Dataset

    A pandas DataFrame or xarray Dataset containing the summary statistics.

hssm.HSSM.plot_trace

plot_trace(
    data: az.InferenceData | None = None,
    include_deterministic: bool = False,
    tight_layout: bool = True,
    **kwargs
) -> None

Generate trace plot with ArviZ but with additional convenience features.

This is a simple wrapper for the az.plot_trace() function. By default, it filters out the deterministic values from the plot. Please see the [arviz documentation] (https://arviz-devs.github.io/arviz/api/generated/arviz.plot_trace.html) for additional parameters that can be specified.

Parameters:

  • data (optional, default: None ) –

    An ArviZ InferenceData object. If None, the traces stored in the model will be used.

  • include_deterministic (optional, default: False ) –

    Whether to include deterministic variables in the plot. Defaults to False. Note that if include deterministic is set to False and and var_names is provided, the var_names provided will be modified to also exclude the deterministic values. If this is not desirable, set include deterministic to True.

  • tight_layout (optional, default: True ) –

    Whether to call plt.tight_layout() after plotting. Defaults to True.

hssm.HSSM.graph

graph(formatting='plain', name=None, figsize=None, dpi=300, fmt='png')

Produce a graphviz Digraph from a built HSSM model.

Requires graphviz, which may be installed most easily with conda install -c conda-forge python-graphviz. Alternatively, you may install the graphviz binaries yourself, and then pip install graphviz to get the python bindings. See http://graphviz.readthedocs.io/en/stable/manual.html for more information.

Parameters:

  • formatting

    One of "plain" or "plain_with_params". Defaults to "plain".

  • name

    Name of the figure to save. Defaults to None, no figure is saved.

  • figsize

    Maximum width and height of figure in inches. Defaults to None, the figure size is set automatically. If defined and the drawing is larger than the given size, the drawing is uniformly scaled down so that it fits within the given size. Only works if name is not None.

  • dpi

    Point per inch of the figure to save. Defaults to 300. Only works if name is not None.

  • fmt

    Format of the figure to save. Defaults to "png". Only works if name is not None.

Returns:

  • Graph

    The graph

hssm.HSSM.plot_posterior_predictive

plot_posterior_predictive(**kwargs) -> mpl.axes.Axes | sns.FacetGrid

Produce a posterior predictive plot.

Equivalent to calling hssm.plotting.plot_posterior_predictive() with the model. Please see that function for full documentation.

Returns:

  • Axes | FacetGrid

    The matplotlib axis or seaborn FacetGrid object containing the plot.

hssm.HSSM.plot_quantile_probability

plot_quantile_probability(**kwargs) -> mpl.axes.Axes | sns.FacetGrid

Produce a quantile probability plot.

Equivalent to calling hssm.plotting.plot_quantile_probability() with the model. Please see that function for full documentation.

Returns:

  • Axes | FacetGrid

    The matplotlib axis or seaborn FacetGrid object containing the plot.

hssm.HSSM.restore_traces

restore_traces(
    traces: az.InferenceData | pm.Approximation | str | PathLike,
) -> None

Restore traces from an InferenceData object or a .netcdf file.

Parameters:

  • traces (InferenceData | Approximation | str | PathLike) –

    An InferenceData object or a path to a file containing the traces.

hssm.HSSM.initial_point

initial_point(transformed: bool = False) -> dict[str, np.ndarray]

Compute the initial point of the model.

This is a slightly altered version of pm.initial_point.initial_point().

Parameters:

  • transformed (bool, default: False ) –

    If True, return the initial point in transformed space.

Returns:

  • dict

    A dictionary containing the initial point of the model parameters.