Bayesian t-tests¶
In this quick tutorial we illustrate how to use posterior samples to perform Bayesian hypothesis testing.
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import hssm
Example 1: Separate models¶
Simulate Data¶
We will simulate a simple dataset that contains two conditions.
# Condition 1
condition_1 = hssm.simulate_data(
model="ddm", theta=dict(v=0.5, a=1.5, z=0.5, t=0.1), size=500
)
# Condition 2
condition_2 = hssm.simulate_data(
model="ddm", theta=dict(v=1.0, a=1.5, z=0.5, t=0.1), size=500
)
Specify Models¶
We will fit two separate models to the data.
# Model 1
m1 = hssm.HSSM(model="ddm", data=condition_1)
m1.sample(sampler="mcmc", tune=500, draws=500)
# Model 2
m2 = hssm.HSSM(model="ddm", data=condition_2)
m2.sample(sampler="mcmc", tune=500, draws=500)
Model initialized successfully. Using default initvals.
Initializing NUTS using adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [a, z, t, v]
Output()
Sampling 4 chains for 500 tune and 500 draw iterations (2_000 + 2_000 draws total) took 7 seconds. 100%|██████████| 2000/2000 [00:00<00:00, 5561.68it/s]
Model initialized successfully. Using default initvals.
Initializing NUTS using adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [a, z, t, v]
Output()
Sampling 4 chains for 500 tune and 500 draw iterations (2_000 + 2_000 draws total) took 7 seconds. There were 6 divergences after tuning. Increase `target_accept` or reparameterize. 100%|██████████| 2000/2000 [00:00<00:00, 4822.23it/s]
-
<xarray.Dataset> Size: 68kB Dimensions: (chain: 4, draw: 500) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 4kB 0 1 2 3 4 5 6 7 ... 493 494 495 496 497 498 499 Data variables: z (chain, draw) float64 16kB 0.4935 0.4286 0.4434 ... 0.3994 0.3966 t (chain, draw) float64 16kB 0.0703 0.05121 ... 0.02736 0.02714 v (chain, draw) float64 16kB 1.192 1.146 1.111 ... 1.1 1.328 1.3 a (chain, draw) float64 16kB 1.608 1.533 1.528 ... 1.458 1.618 1.633 Attributes: created_at: 2025-09-27T16:57:14.317159+00:00 arviz_version: 0.22.0 inference_library: pymc inference_library_version: 5.25.1 sampling_time: 7.405345916748047 tuning_steps: 500 modeling_interface: bambi modeling_interface_version: 0.15.0
-
<xarray.Dataset> Size: 8MB Dimensions: (chain: 4, draw: 500, __obs__: 500) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 4kB 0 1 2 3 4 5 6 ... 493 494 495 496 497 498 499 * __obs__ (__obs__) int64 4kB 0 1 2 3 4 5 6 ... 494 495 496 497 498 499 Data variables: rt,response (chain, draw, __obs__) float64 8MB -0.4059 -0.5374 ... -2.267 Attributes: modeling_interface: bambi modeling_interface_version: 0.15.0
-
<xarray.Dataset> Size: 264kB Dimensions: (chain: 4, draw: 500) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 4kB 0 1 2 3 4 5 ... 495 496 497 498 499 Data variables: (12/18) process_time_diff (chain, draw) float64 16kB 0.006708 ... 0.001682 lp (chain, draw) float64 16kB -649.6 -650.4 ... -656.1 energy (chain, draw) float64 16kB 651.0 653.7 ... 656.9 energy_error (chain, draw) float64 16kB -0.05129 0.5167 ... 0.9327 diverging (chain, draw) bool 2kB False False ... False False index_in_trajectory (chain, draw) int64 16kB -5 -8 5 -3 -3 ... -4 9 12 1 ... ... perf_counter_diff (chain, draw) float64 16kB 0.006709 ... 0.001681 n_steps (chain, draw) float64 16kB 15.0 15.0 ... 15.0 3.0 tree_depth (chain, draw) int64 16kB 4 4 5 3 4 3 ... 4 3 4 4 4 2 largest_eigval (chain, draw) float64 16kB nan nan nan ... nan nan step_size_bar (chain, draw) float64 16kB 0.3382 0.3382 ... 0.3092 reached_max_treedepth (chain, draw) bool 2kB False False ... False False Attributes: created_at: 2025-09-27T16:57:14.329471+00:00 arviz_version: 0.22.0 inference_library: pymc inference_library_version: 5.25.1 sampling_time: 7.405345916748047 tuning_steps: 500 modeling_interface: bambi modeling_interface_version: 0.15.0
-
<xarray.Dataset> Size: 12kB Dimensions: (__obs__: 500, rt,response_extra_dim_0: 2) Coordinates: * __obs__ (__obs__) int64 4kB 0 1 2 3 4 ... 496 497 498 499 * rt,response_extra_dim_0 (rt,response_extra_dim_0) int64 16B 0 1 Data variables: rt,response (__obs__, rt,response_extra_dim_0) float64 8kB 0... Attributes: created_at: 2025-09-27T16:57:14.333068+00:00 arviz_version: 0.22.0 inference_library: pymc inference_library_version: 5.25.1 modeling_interface: bambi modeling_interface_version: 0.15.0
Bayesian t-test¶
Now, let's ask the (informed) question:
Is the v
parameter higher in condition 1 than condition 2?
The beauty of the Bayesian approach is that this questions boils down to simple counting.
Reformulating the question: Are the v
samples of my posterior for condition 1 higher than the samples of my posterior for condition 2?
Let's check for that, we have everything we need in our posterior samples!
# Percent of posterior samples
print(
f"p(v_1 > v_2) = {np.mean(m1.traces.posterior['v'].values > m2.traces.posterior['v'].values)}"
)
p(v_1 > v_2) = 0.0
Looks like our inference indicates that there is a $0\%$ chance for v
parameter of condition 1 to be higher than the v
parameter of condition 2.
Plotting¶
We can also plot the posterior samples to get a sense of the uncertainty.
# Specify a plot with one row and two columns
fig, ax = plt.subplots(1, 1, figsize=(10, 5))
# Plot the posterior samples for condition 1
az.plot_posterior(
m1.traces.posterior, var_names="v", ax=ax, color="blue", hdi_prob=0.95
)
# Plot the posterior samples for condition 2
az.plot_posterior(m2.traces.posterior, var_names="v", ax=ax, color="red", hdi_prob=0.95)
# Set x-axis limit
ax.set_xlim(0.25, 1.25)
# Create proxy artists for the legend
from matplotlib.lines import Line2D
legend_elements = [
Line2D([0], [0], color="blue", label="Condition 1"),
Line2D([0], [0], color="red", label="Condition 2"),
]
# Add a legend
ax.legend(handles=legend_elements)
# Add a title
ax.set_title("Posterior samples for v")
# Add a x-axis label
ax.set_xlabel("v")
Text(0.5, 0, 'v')
A glance a these posteriors visually corroborates our simple counting analysis. The two posteriors for v
are clearly separated.
To appreciate the difference, let us also plot the posteriors of the respective a
parameters.
# Specify a plot with one row and two columns
fig, ax = plt.subplots(1, 1, figsize=(10, 5))
# Plot the posterior samples for condition 1
az.plot_posterior(
m1.traces.posterior, var_names="a", ax=ax, color="blue", hdi_prob=0.95
)
# Plot the posterior samples for condition 2
az.plot_posterior(m2.traces.posterior, var_names="a", ax=ax, color="red", hdi_prob=0.95)
# Set x-axis limit
ax.set_xlim(1.3, 1.7)
# Create proxy artists for the legend
from matplotlib.lines import Line2D
legend_elements = [
Line2D([0], [0], color="blue", label="Condition 1"),
Line2D([0], [0], color="red", label="Condition 2"),
]
# Add a legend
ax.legend(handles=legend_elements)
# Add a title
ax.set_title("Posterior samples for a")
# Add a x-axis label
ax.set_xlabel("v")
Text(0.5, 0, 'v')
and correspondingly,
print(
f"p(a_1 > a_2) = {np.mean(m1.traces.posterior['a'].values > m2.traces.posterior['a'].values)}"
)
p(a_1 > a_2) = 0.456
Example 2: Combined Model¶
condition_1["condition"] = "C1"
condition_2["condition"] = "C2"
data = pd.concat([condition_1, condition_2]).reset_index(drop=True)
m_combined = hssm.HSSM(
model="ddm",
data=data,
include=[
{
"name": "v",
"formula": "v ~ 1 + condition",
}
],
)
idata_combined = m_combined.sample(sampler="mcmc", tune=500, draws=500)
Model initialized successfully. Using default initvals.
Initializing NUTS using adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [a, z, t, v_Intercept, v_condition]
Output()
Sampling 4 chains for 500 tune and 500 draw iterations (2_000 + 2_000 draws total) took 12 seconds. 100%|██████████| 2000/2000 [00:00<00:00, 2782.40it/s]
Note, now we don't have two distinct idata
objects, but instead we have a single idata
object with a posterior for v_condition
. Let's take a closer look.
m_combined.traces.posterior
<xarray.Dataset> Size: 84kB Dimensions: (chain: 4, draw: 500, v_condition_dim: 1) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 4kB 0 1 2 3 4 5 6 ... 494 495 496 497 498 499 * v_condition_dim (v_condition_dim) <U2 8B 'C2' Data variables: v_Intercept (chain, draw) float64 16kB 0.623 0.6149 ... 0.6416 0.5532 t (chain, draw) float64 16kB 0.08535 0.0944 ... 0.05899 z (chain, draw) float64 16kB 0.4658 0.4764 ... 0.4529 0.4515 v_condition (chain, draw, v_condition_dim) float64 16kB 0.5639 ... 0... a (chain, draw) float64 16kB 1.53 1.536 1.524 ... 1.606 1.508 Attributes: created_at: 2025-09-27T16:57:33.821207+00:00 arviz_version: 0.22.0 inference_library: pymc inference_library_version: 5.25.1 sampling_time: 12.113749980926514 tuning_steps: 500 modeling_interface: bambi modeling_interface_version: 0.15.0
Under the hood, Bambi created a model with a dummy variable for the C2
condition (the C1
condition represents the Intercept).
So what do we need to test here...
We still ask the same question: Is the v_condition[C2]
posterior higher than the v_condition[C1]
posterior?
But now we only need to check if our v_condition[C2]
variable is above 0
, since it represents the offset from the Intercept directly.
Let's check:
print(
f"p(v_condition[C2] > v_condition[C1]) = {np.mean(idata_combined.posterior['v_condition'].values > 0)}"
)
p(v_condition[C2] > v_condition[C1]) = 1.0
or visually,
az.plot_posterior(idata_combined.posterior, var_names="v_condition", hdi_prob=0.95)
<Axes: title={'center': 'v_condition\nC2'}>
You can use this approach to test any number of complex statements about your parameters. There will essentially always be a way to turn your question into a simple comparison of the posterior samples, a simple counting problem.