Skip to content

hssm

Top-level entry to the HSSM package.

The hssm module is the top-level entry to the HSSM package. It exports some of the most important classes, including the HSSM class that handles model creation and sampling. You will also find utility classes such as hssm.Prior, hssm.ModelConfig, and hssm.Param, with which users will often interface. Additionally, most frequently used utility functions can also be found.

hssm.HSSM

HSSM(data: DataFrame, model: SupportedModels | str = 'ddm', include: list[dict | Param] | None = None, model_config: ModelConfig | dict | None = None, loglik: str | PathLike | Callable | Op | type[Distribution] | None = None, loglik_kind: LoglikKind | None = None, p_outlier: float | dict | Prior | None = 0.05, lapse: dict | Prior | None = bmb.Prior('Uniform', lower=0.0, upper=10.0), hierarchical: bool = False, link_settings: Literal['log_logit'] | None = None, prior_settings: Literal['safe'] | None = None, extra_namespace: dict[str, Any] | None = None, missing_data: bool | float = False, deadline: bool | str = False, loglik_missing_data: str | PathLike | Callable | Op | None = None, **kwargs)

The basic Hierarchical Sequential Sampling Model (HSSM) class.

Parameters:

  • data (DataFrame) –

    A pandas DataFrame with the minimum requirements of containing the data with the columns "rt" and "response".

  • model (SupportedModels | str, default: 'ddm' ) –

    The name of the model to use. Currently supported models are "ddm", "ddm_sdv", "full_ddm", "angle", "levy", "ornstein", "weibull", "race_no_bias_angle_4", "ddm_seq2_no_bias". If any other string is passed, the model will be considered custom, in which case all model_config, loglik, and loglik_kind have to be provided by the user.

  • include (optional, default: None ) –

    A list of dictionaries specifying parameter specifications to include in the model. If left unspecified, defaults will be used for all parameter specifications. Defaults to None.

  • model_config (optional, default: None ) –

    A dictionary containing the model configuration information. If None is provided, defaults will be used if there are any. Defaults to None. Fields for this dict are usually:

    • "list_params": a list of parameters indicating the parameters of the model. The order in which the parameters are specified in this list is important. Values for each parameter will be passed to the likelihood function in this order.
    • "backend": Only used when loglik_kind is approx_differentiable and an onnx file is supplied for the likelihood approximation network (LAN). Valid values are "jax" or "pytensor". It determines whether the LAN in ONNX should be converted to "jax" or "pytensor". If not provided, jax will be used for maximum performance.
    • "default_priors": A dict indicating the default priors for each parameter.
    • "bounds": A dict indicating the boundaries for each parameter. In the case of LAN, these bounds are training boundaries.
    • "rv": Optional. Can be a RandomVariable class containing the user's own rng_fn function for sampling from the distribution that the user is supplying. If not supplied, HSSM will automatically generate a RandomVariable using the simulator identified by model from the ssm_simulators package. If model is not supported in ssm_simulators, a warning will be raised letting the user know that sampling from the RandomVariable will result in errors.
    • "extra_fields": Optional. A list of strings indicating the additional columns in data that will be passed to the likelihood function for calculation. This is helpful if the likelihood function depends on data other than the observed data and the parameter values.
  • loglik (optional, default: None ) –

    A likelihood function. Defaults to None. Requirements are:

    1. if loglik_kind is "analytical" or "blackbox", a pm.Distribution, a pytensor Op, or a Python callable can be used. Signatures are:
      • pm.Distribution: needs to have parameters specified exactly as listed in list_params
      • pytensor.graph.Op and Callable: needs to accept the parameters specified exactly as listed in list_params
    2. If loglik_kind is "approx_differentiable", then in addition to the specifications above, a str or Pathlike can also be used to specify a path to an onnx file. If a str is provided, HSSM will first look locally for an onnx file. If that is not successful, HSSM will try to download that onnx file from Hugging Face hub.
    3. It can also be None, in which case a default likelihood function will be used
  • loglik_kind (optional, default: None ) –

    A string that specifies the kind of log-likelihood function specified with loglik. Defaults to None. Can be one of the following:

    • "analytical": an analytical (approximation) likelihood function. It is differentiable and can be used with samplers that requires differentiation.
    • "approx_differentiable": a likelihood approximation network (LAN) likelihood function. It is differentiable and can be used with samplers that requires differentiation.
    • "blackbox": a black box likelihood function. It is typically NOT differentiable.
    • None, in which a default will be used. For ddm type of models, the default will be analytical. For other models supported, it will be approx_differentiable. If the model is a custom one, a ValueError will be raised.
  • p_outlier (optional, default: 0.05 ) –

    The fixed lapse probability or the prior distribution of the lapse probability. Defaults to a fixed value of 0.05. When None, the lapse probability will not be included in estimation.

  • lapse (optional, default: Prior('Uniform', lower=0.0, upper=10.0) ) –

    The lapse distribution. This argument is required only if p_outlier is not None. Defaults to Uniform(0.0, 10.0).

  • hierarchical (optional, default: False ) –

    If True, and if there is a participant_id field in data, will by default turn any unspecified parameter theta into a regression with "theta ~ 1 + (1|participant_id)" and default priors set by bambi. Also changes default values of link_settings and prior_settings. Defaults to False.

  • link_settings (optional, default: None ) –

    An optional string literal that indicates the link functions to use for each parameter. Helpful for hierarchical models where sampling might get stuck/ very slow. Can be one of the following:

    • "log_logit": applies log link functions to positive parameters and generalized logit link functions to parameters that have explicit bounds.
    • None: unless otherwise specified, the "identity" link functions will be used. The default value is None.
  • prior_settings (optional, default: None ) –

    An optional string literal that indicates the prior distributions to use for each parameter. Helpful for hierarchical models where sampling might get stuck/ very slow. Can be one of the following:

    • "safe": HSSM will scan all parameters in the model and apply safe priors to all parameters that do not have explicit bounds.
    • None: HSSM will use bambi to provide default priors for all parameters. Not recommended when you are using hierarchical models. The default value is None when hierarchical is False and "safe" when hierarchical is True.
  • extra_namespace (optional, default: None ) –

    Additional user supplied variables with transformations or data to include in the environment where the formula is evaluated. Defaults to None.

  • missing_data (optional, default: False ) –

    Specifies whether the model should handle missing data. Can be a bool or a float. If False, and if the rt column contains in the data -999.0, the model will drop these rows and produce a warning. If True, the model will treat code -999.0 as missing data. If a float is provided, the model will treat this value as the missing data value. Defaults to False.

  • deadline (optional, default: False ) –

    Specifies whether the model should handle deadline data. Can be a bool or a str. If False, the model will not do nothing even if a deadline column is provided. If True, the model will treat the deadline column as deadline data. If a str is provided, the model will treat this value as the name of the deadline column. Defaults to False.

  • loglik_missing_data (optional, default: None ) –

    A likelihood function for missing data. Please see the loglik parameter to see how to specify the likelihood function this parameter. If nothing is provided, a default likelihood function will be used. This parameter is required only if either missing_data or deadline is not False. Defaults to None.

  • **kwargs

    Additional arguments passed to the bmb.Model object.

Attributes:

  • data

    A pandas DataFrame with at least two columns of "rt" and "response" indicating the response time and responses.

  • list_params

    The list of strs of parameter names.

  • model_name

    The name of the model.

  • loglik

    The likelihood function or a path to an onnx file.

  • loglik_kind

    The kind of likelihood used.

  • model_config

    A dictionary representing the model configuration.

  • model_distribution

    The likelihood function of the model in the form of a pm.Distribution subclass.

  • family

    A Bambi family object.

  • priors

    A dictionary containing the prior distribution of parameters.

  • formula

    A string representing the model formula.

  • link

    A string or a dictionary representing the link functions for all parameters.

  • params

    A list of Param objects representing model parameters.

pymc_model property

pymc_model: Model

Provide access to the PyMC model.

Returns:

  • Model

    The PyMC model built by bambi

response_c property

response_c: str

Return the response variable names in c() format.

response_str property

response_str: str

Return the response variable names in string format.

traces property

traces: InferenceData | Approximation

Return the trace of the model after sampling.

Raises:

  • ValueError

    If the model has not been sampled yet.

Returns:

  • InferenceData | Approximation

    The trace of the model after sampling.

graph

graph(formatting='plain', name=None, figsize=None, dpi=300, fmt='png')

Produce a graphviz Digraph from a built HSSM model.

Requires graphviz, which may be installed most easily with conda install -c conda-forge python-graphviz. Alternatively, you may install the graphviz binaries yourself, and then pip install graphviz to get the python bindings. See http://graphviz.readthedocs.io/en/stable/manual.html for more information.

Parameters:

  • formatting

    One of "plain" or "plain_with_params". Defaults to "plain".

  • name

    Name of the figure to save. Defaults to None, no figure is saved.

  • figsize

    Maximum width and height of figure in inches. Defaults to None, the figure size is set automatically. If defined and the drawing is larger than the given size, the drawing is uniformly scaled down so that it fits within the given size. Only works if name is not None.

  • dpi

    Point per inch of the figure to save. Defaults to 300. Only works if name is not None.

  • fmt

    Format of the figure to save. Defaults to "png". Only works if name is not None.

Returns:

  • Graph

    The graph

Note
The code is largely copied from
https://github.com/bambinos/bambi/blob/main/bambi/models.py
Credit for the code goes to Bambi developers.

plot_posterior_predictive

plot_posterior_predictive(**kwargs) -> Axes | FacetGrid

Produce a posterior predictive plot.

Equivalent to calling hssm.plotting.plot_posterior_predictive() with the model. Please see that function for full documentation.

Returns:

  • Axes | FacetGrid

    The matplotlib axis or seaborn FacetGrid object containing the plot.

plot_quantile_probability

plot_quantile_probability(**kwargs) -> Axes | FacetGrid

Produce a quantile probability plot.

Equivalent to calling hssm.plotting.plot_quantile_probability() with the model. Please see that function for full documentation.

Returns:

  • Axes | FacetGrid

    The matplotlib axis or seaborn FacetGrid object containing the plot.

plot_trace

plot_trace(data: InferenceData | None = None, include_deterministic: bool = False, tight_layout: bool = True, **kwargs) -> None

Generate trace plot with ArviZ but with additional convenience features.

This is a simple wrapper for the az.plot_trace() function. By default, it filters out the deterministic values from the plot. Please see the [arviz documentation] (https://arviz-devs.github.io/arviz/api/generated/arviz.plot_trace.html) for additional parameters that can be specified.

Parameters:

  • data (optional, default: None ) –

    An ArviZ InferenceData object. If None, the traces stored in the model will be used.

  • include_deterministic (optional, default: False ) –

    Whether to include deterministic variables in the plot. Defaults to False. Note that if include deterministic is set to False and and var_names is provided, the var_names provided will be modified to also exclude the deterministic values. If this is not desirable, set include deterministic to True.

  • tight_layout (optional, default: True ) –

    Whether to call plt.tight_layout() after plotting. Defaults to True.

sample

sample(sampler: Literal['mcmc', 'nuts_numpyro', 'nuts_blackjax', 'laplace', 'vi'] | None = None, init: str | None = None, **kwargs) -> InferenceData | Approximation

Perform sampling using the fit method via bambi.Model.

Parameters:

  • sampler (Literal['mcmc', 'nuts_numpyro', 'nuts_blackjax', 'laplace', 'vi'] | None, default: None ) –

    The sampler to use. Can be one of "mcmc", "nuts_numpyro", "nuts_blackjax", "laplace", or "vi". If using blackbox likelihoods, this cannot be "nuts_numpyro" or "nuts_blackjax". By default it is None, and sampler will automatically be chosen: when the model uses the approx_differentiable likelihood, and jax backend, "nuts_numpyro" will be used. Otherwise, "mcmc" (the default PyMC NUTS sampler) will be used.

  • init (str | None, default: None ) –

    Initialization method to use for the sampler. If any of the NUTS samplers is used, defaults to "adapt_diag". Otherwise, defaults to "auto".

  • kwargs

    Other arguments passed to bmb.Model.fit(). Please see [here] (https://bambinos.github.io/bambi/api_reference.html#bambi.models.Model.fit) for full documentation.

Returns:

  • InferenceData | Approximation

    An ArviZ InferenceData instance if inference_method is "mcmc" (default), "nuts_numpyro", "nuts_blackjax" or "laplace". An Approximation object if "vi".

sample_posterior_predictive

sample_posterior_predictive(idata: InferenceData | None = None, data: DataFrame | None = None, inplace: bool = True, include_group_specific: bool = True, kind: Literal['pps', 'mean'] = 'pps', n_samples: int | float | None = None) -> InferenceData | None

Perform posterior predictive sampling from the HSSM model.

Parameters:

  • idata (optional, default: None ) –

    The InferenceData object returned by HSSM.sample(). If not provided, the InferenceData from the last time sample() is called will be used.

  • data (optional, default: None ) –

    An optional data frame with values for the predictors that are used to obtain out-of-sample predictions. If omitted, the original dataset is used.

  • inplace (optional, default: True ) –

    If True will modify idata in-place and append a posterior_predictive group to idata. Otherwise, it will return a copy of idata with the predictions added, by default True.

  • include_group_specific (optional, default: True ) –

    If True will make predictions including the group specific effects. Otherwise, predictions are made with common effects only (i.e. group- specific are set to zero), by default True.

  • kind (Literal['pps', 'mean'], default: 'pps' ) –

    Indicates the type of prediction required. Can be "mean" or "pps". The first returns draws from the posterior distribution of the mean, while the latter returns the draws from the posterior predictive distribution (i.e. the posterior probability distribution for a new observation). Defaults to "pps".

  • n_samples (int | float | None, default: None ) –

    The number of samples to draw from the posterior predictive distribution from each chain. When it's an integer >= 1, the number of samples to be extracted from the draw dimension. If this integer is larger than the number of posterior samples in each chain, all posterior samples will be used in posterior predictive sampling. When a float between 0 and 1, the proportion of samples from the draw dimension from each chain to be used in posterior predictive sampling.. If this proportion is very small, at least one sample will be used. When None, all posterior samples will be used. Defaults to None.

Raises:

  • ValueError

    If the model has not been sampled yet and idata is not provided.

Returns:

  • InferenceData | None

    InferenceData or None

sample_prior_predictive

sample_prior_predictive(draws: int = 500, var_names: str | list[str] | None = None, omit_offsets: bool = True, random_seed: Generator | None = None) -> InferenceData

Generate samples from the prior predictive distribution.

Parameters:

  • draws (int, default: 500 ) –

    Number of draws to sample from the prior predictive distribution. Defaults to 500.

  • var_names (str | list[str] | None, default: None ) –

    A list of names of variables for which to compute the prior predictive distribution. Defaults to None which means both observed and unobserved RVs.

  • omit_offsets (bool, default: True ) –

    Whether to omit offset terms. Defaults to True.

  • random_seed (Generator | None, default: None ) –

    Seed for the random number generator.

Returns:

  • InferenceData

    InferenceData object with the groups prior, prior_predictive and observed_data.

set_alias

set_alias(aliases: dict[str, str | dict])

Set parameter aliases.

Sets the aliases according to the dictionary passed to it and rebuild the model.

Parameters:

  • aliases (dict[str, str | dict]) –

    A dict specifying the parameter names being aliased and the aliases.

summary

summary(data: InferenceData | None = None, include_deterministic: bool = False, **kwargs) -> DataFrame | Dataset

Produce a summary table with ArviZ but with additional convenience features.

This is a simple wrapper for the az.summary() function. By default, it filters out the deterministic values from the plot. Please see the [arviz documentation] (https://arviz-devs.github.io/arviz/api/generated/arviz.summary.html) for additional parameters that can be specified.

Parameters:

  • data (InferenceData | None, default: None ) –

    An ArviZ InferenceData object. If None, the traces stored in the model will be used.

  • include_deterministic (optional, default: False ) –

    Whether to include deterministic variables in the plot. Defaults to False. Note that if include_deterministic is set to False and and var_names is provided, the var_names provided will be modified to also exclude the deterministic values. If this is not desirable, set include_deterministic to True.

Returns:

  • DataFrame | Dataset

    A pandas DataFrame or xarray Dataset containing the summary statistics.

Link(name, link=None, linkinv=None, linkinv_backend=None, bounds: tuple[float, float] | None = None)

Bases: Link

Representation of a generalized link function.

This object contains two main functions. One is the link function itself, the function that maps values in the response scale to the linear predictor, and the other is the inverse of the link function, that maps values of the linear predictor to the response scale.

The great majority of users will never interact with this class unless they want to create a custom Family with a custom Link. This is automatically handled for all the built-in families.

Parameters:

  • name

    The name of the link function. If it is a known name, it's not necessary to pass any other arguments because functions are already defined internally. If not known, all of `link,linkinvandlinkinv_backend`` must be specified.

  • link (optional, default: None ) –

    A function that maps the response to the linear predictor. Known as the :math:g function in GLM jargon. Does not need to be specified when name is a known name.

  • linkinv (optional, default: None ) –

    A function that maps the linear predictor to the response. Known as the :math:g^{-1} function in GLM jargon. Does not need to be specified when name is a known name.

  • linkinv_backend (optional, default: None ) –

    Same than linkinv but must be something that works with PyMC backend (i.e. it must work with PyTensor tensors). Does not need to be specified when name is a known name.

  • bounds (optional, default: None ) –

    Bounds of the response scale. Only needed when name is gen_logit.

hssm.ModelConfig dataclass

ModelConfig(response: list[str] | None = None, list_params: list[str] | None = None, default_priors: dict[str, ParamSpec] = dict(), bounds: dict[str, tuple[float, float]] = dict(), backend: Literal['jax', 'pytensor'] | None = None, rv: RandomVariable | None = None, extra_fields: list[str] | None = None)

Representation for model_config provided by the user.

hssm.Param

Param(name: str | None = None, prior: ParamSpec | dict[str, ParamSpec] | None = None, formula: str | None = None, link: str | Link | None = None, bounds: tuple[float, float] | None = None)

Represents the specifications for the main HSSM class.

Also provides convenience functions that can be used by the HSSM class to parse arguments.

Parameters:

  • name (str | None, default: None ) –

    The name of the parameter.

  • prior (ParamSpec | dict[str, ParamSpec] | None, default: None ) –

    If a formula is not specified (the non-regression case), this parameter expects a float value if the parameter is fixed or a dictionary that can be parsed by Bambi as a prior specification or a Bambi Prior object. If not specified, then a default uninformative uniform prior with bound as boundaries will be constructed. An error will be thrown if bound is also not specified. If a formula is specified (the regression case), this parameter expects a dictionary of param:prior, where param is the name of the response variable specified in formula, and prior is specified as above. If left unspecified, default priors created by Bambi will be used.

  • formula (str | None, default: None ) –

    The regression formula if the parameter depends on other variables. The response variable can be omitted.

  • link (str | Link | None, default: None ) –

    The link function for the regression. It is either a string that specifies a built-in link function in Bambi, or a Bambi Link object. If a regression is specified and link is not specified, "identity" will be used by default.

  • bounds (tuple[float, float] | None, default: None ) –

    If provided, the prior will be created with boundary checks. If this parameter is specified as a regression, boundary checks will be skipped at this point.

is_fixed property

is_fixed: bool

Determine if a parameter is a fixed value.

Returns:

  • bool

    A boolean that indicates if the parameter is a fixed value.

is_parent property

is_parent: bool

Determines if a parameter is a parent parameter for Bambi.

Returns:

  • bool

    A boolean that indicates if the parameter is a parent or not.

is_regression property

is_regression: bool

Determines if a regression is specified or not.

Returns:

  • bool

    A boolean that indicates if a regression is specified.

is_truncated property

is_truncated: bool

Determines if a parameter is truncated.

A parameter is truncated when it is not a regression, is not fixed and has bounds.

Returns:

  • A boolean that indicates if a parameter is truncated.

convert

convert()

Process the information passed to the class.

do_not_truncate

do_not_truncate()

Flag that prior should not be truncated.

This is most likely because both default prior and default bounds are supplied.

override_default_link()

Override the default link function.

This is most likely because both default prior and default bounds are supplied.

override_default_priors

override_default_priors(data: DataFrame, eval_env: dict[str, Any])

Override the default priors - the general case.

By supplying priors for all parameters in the regression, we can override the defaults that Bambi uses.

Parameters:

  • data (DataFrame) –

    The data used to fit the model.

  • eval_env (dict[str, Any]) –

    The environment used to evaluate the formula.

override_default_priors_ddm

override_default_priors_ddm(data: DataFrame, eval_env: dict[str, Any])

Override the default priors - the ddm case.

By supplying priors for all parameters in the regression, we can override the defaults that Bambi uses.

Parameters:

  • data (DataFrame) –

    The data used to fit the model.

  • eval_env (dict[str, Any]) –

    The environment used to evaluate the formula.

parse_bambi

parse_bambi() -> tuple

Return a 3-tuple that helps with constructing the Bambi model.

Returns:

  • tuple

    A 3-tuple of formula, priors, and link functions that can be used to construct the Bambi model.

set_parent

set_parent()

Set the Param as parent.

update

update(**kwargs)

Update the initial information stored in the class.

hssm.Prior

Prior(name: str, auto_scale: bool = True, dist: Callable | None = None, bounds: tuple[float, float] | None = None, **kwargs)

Bases: Prior

Abstract specification of a prior.

Parameters:

  • name (str) –

    Name of prior distribution. Must be the name of a PyMC distribution (e.g., "Normal", "Bernoulli", etc.)

  • auto_scale (optional, default: True ) –

    Whether to adjust the parameters of the prior or use them as passed. Default to True.

  • kwargs

    Optional keywords specifying the parameters of the named distribution.

  • dist (optional, default: None ) –

    A callable that returns a valid PyMC distribution. The signature must contain name, dims, and shape, as well as its own keyworded arguments.

  • bounds (optional, default: None ) –

    A tuple of two floats indicating the lower and upper bounds of the prior.

hssm.load_data

load_data(dataset: Optional[str] = None) -> Union[DataFrame, str]

Load a dataset as a pandas DataFrame.

If a valid dataset name is provided, this function will return the corresponding DataFrame. Otherwise, it lists the available datasets.

Parameters:

  • dataset (str, default: None ) –

    Name of the dataset to load. If not provided, a list of available datasets is returned.

Raises:

  • ValueError

    If the provided dataset name does not match any of the available datasets.

Returns:

  • DataFrame or str

    Loaded dataset as a DataFrame if a valid dataset name was provided, otherwise a string listing the available datasets.

hssm.set_floatX

set_floatX(dtype: Literal['float32', 'float64'], jax: bool = True)

Set float types for pytensor and Jax.

Often we wish to work with a specific type of float in both PyTensor and JAX. This function helps set float types in both packages.

Parameters:

  • dtype (Literal['float32', 'float64']) –

    Either float32 or float64. Float type for pytensor (and jax if jax=True).

  • jax (optional, default: True ) –

    Whether this function also sets float type for JAX by changing the jax_enable_x64 setting in JAX config. Defaults to True.

hssm.show_defaults

show_defaults(model: SupportedModels, loglik_kind=Optional[LoglikKind]) -> str

Show the defaults for supported models.

Parameters:

  • model (SupportedModels) –

    One of the supported model strings.

  • loglik_kind (optional, default: Optional[LoglikKind] ) –

    The kind of likelihood function, by default None, in which case the defaults for all likelihoods will be shown.

Returns:

  • str

    A nicely organized printout for the defaults of provided model.

hssm.simulate_data

simulate_data(model: str, theta: ArrayLike, size: int, random_state: int | None = None, output_df: bool = True, **kwargs) -> ndarray | DataFrame

Sample simulated data from specified distributions.

Parameters:

  • model (str) –

    A model name that must be supported in ssm_simulators. For a detailed list of supported models, please see all fields in the model_config dict here

  • theta (ArrayLike) –

    An ArrayLike of floats that represent the true values of the parameters of the specified model. Please see here for what the parameters are. True values must be supplied in the same order as the parameters. You can also supply a 2D ArrayLike to simulate data for different trials with different true values.

  • size (int) –

    The size of the data to be simulated. If theta is a 2D ArrayLike, this parameter indicates the size of data to be simulated for each trial.

  • random_state (optional, default: None ) –

    A random seed for reproducibility.

  • output_df (optional, default: True ) –

    If True, outputs a DataFrame with column names "rt", "response". Otherwise a 2-column numpy array, by default True.

  • kwargs (optional, default: {} ) –

    Other arguments passed to ssms.basic_simulators.simulator.

Returns:

  • ndarray | DataFrame

    An array or DataFrame with simulated data.