Bayesian t-tests¶

In this quick tutorial we illustrate how to use posterior samples to perform Bayesian hypothesis testing.

In [1]:

Copied!





import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import hssm
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import hssm

Example 1: Separate models¶

Simulate Data¶

We will simulate a simple dataset that contains two conditions.

In [2]:

Copied!





# Condition 1
condition_1 = hssm.simulate_data(
    model="ddm", theta=dict(v=0.5, a=1.5, z=0.5, t=0.1), size=500
)

# Condition 2
condition_2 = hssm.simulate_data(
    model="ddm", theta=dict(v=1.0, a=1.5, z=0.5, t=0.1), size=500
)
# Condition 1
condition_1 = hssm.simulate_data(
    model="ddm", theta=dict(v=0.5, a=1.5, z=0.5, t=0.1), size=500
)

# Condition 2
condition_2 = hssm.simulate_data(
    model="ddm", theta=dict(v=1.0, a=1.5, z=0.5, t=0.1), size=500
)

Specify Models¶

We will fit two separate models to the data.

In [3]:

Copied!

# Model 1
m1 = hssm.HSSM(model="ddm", data=condition_1)

m1.sample(sampler="mcmc", tune=500, draws=500)

# Model 2
m2 = hssm.HSSM(model="ddm", data=condition_2)

m2.sample(sampler="mcmc", tune=500, draws=500)
# Model 1
m1 = hssm.HSSM(model="ddm", data=condition_1)

m1.sample(sampler="mcmc", tune=500, draws=500)

# Model 2
m2 = hssm.HSSM(model="ddm", data=condition_2)

m2.sample(sampler="mcmc", tune=500, draws=500)

Model initialized successfully.
Using default initvals.

Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [z, a, t, v]

Output()

Sampling 4 chains for 500 tune and 500 draw iterations (2_000 + 2_000 draws total) took 7 seconds.
There were 1 divergences after tuning. Increase `target_accept` or reparameterize.
/Users/afengler/Library/CloudStorage/OneDrive-Personal/proj_hssm/HSSM/.venv/lib/python3.12/site-packages/pymc/pytensorf.py:958: FutureWarning: compile_pymc was renamed to compile. Old name will be removed in a future release of PyMC
  warnings.warn(
100%|██████████| 2000/2000 [00:00<00:00, 5167.00it/s]

Model initialized successfully.
Using default initvals.

Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [z, a, t, v]

Output()

Sampling 4 chains for 500 tune and 500 draw iterations (2_000 + 2_000 draws total) took 7 seconds.
/Users/afengler/Library/CloudStorage/OneDrive-Personal/proj_hssm/HSSM/.venv/lib/python3.12/site-packages/pymc/pytensorf.py:958: FutureWarning: compile_pymc was renamed to compile. Old name will be removed in a future release of PyMC
  warnings.warn(
100%|██████████| 2000/2000 [00:00<00:00, 5310.53it/s]

Out[3]:

arviz.InferenceData

posterior
log_likelihood
sample_stats
observed_data

Bayesian t-test¶

Now, let's ask the (informed) question:

Is the v parameter higher in condition 1 than condition 2?

The beauty of the Bayesian approach is that this questions boils down to simple counting.

Reformulating the question: Are the v samples of my posterior for condition 1 higher than the samples of my posterior for condition 2?

Let's check for that, we have everything we need in our posterior samples!

In [4]:

Copied!





# Percent of posterior samples
print(
    f"p(v_1 > v_2) = {np.mean(m1.traces.posterior['v'].values > m2.traces.posterior['v'].values)}"
)
# Percent of posterior samples
print(
    f"p(v_1 > v_2) = {np.mean(m1.traces.posterior['v'].values > m2.traces.posterior['v'].values)}"
)

p(v_1 > v_2) = 0.0

Looks like our inference indicates that there is a $0\%$ chance for v parameter of condition 1 to be higher than the v parameter of condition 2.

Plotting¶

We can also plot the posterior samples to get a sense of the uncertainty.

In [5]:

Copied!





# Specify a plot with one row and two columns
fig, ax = plt.subplots(1, 1, figsize=(10, 5))

# Plot the posterior samples for condition 1
az.plot_posterior(
    m1.traces.posterior, var_names="v", ax=ax, color="blue", hdi_prob=0.95
)

# Plot the posterior samples for condition 2
az.plot_posterior(m2.traces.posterior, var_names="v", ax=ax, color="red", hdi_prob=0.95)

# Set x-axis limit
ax.set_xlim(0.25, 1.25)

# Create proxy artists for the legend
from matplotlib.lines import Line2D

legend_elements = [
    Line2D([0], [0], color="blue", label="Condition 1"),
    Line2D([0], [0], color="red", label="Condition 2"),
]

# Add a legend
ax.legend(handles=legend_elements)

# Add a title
ax.set_title("Posterior samples for v")

# Add a x-axis label
ax.set_xlabel("v")
# Specify a plot with one row and two columns
fig, ax = plt.subplots(1, 1, figsize=(10, 5))

# Plot the posterior samples for condition 1
az.plot_posterior(
    m1.traces.posterior, var_names="v", ax=ax, color="blue", hdi_prob=0.95
)

# Plot the posterior samples for condition 2
az.plot_posterior(m2.traces.posterior, var_names="v", ax=ax, color="red", hdi_prob=0.95)

# Set x-axis limit
ax.set_xlim(0.25, 1.25)

# Create proxy artists for the legend
from matplotlib.lines import Line2D

legend_elements = [
    Line2D([0], [0], color="blue", label="Condition 1"),
    Line2D([0], [0], color="red", label="Condition 2"),
]

# Add a legend
ax.legend(handles=legend_elements)

# Add a title
ax.set_title("Posterior samples for v")

# Add a x-axis label
ax.set_xlabel("v")

Out[5]:

Text(0.5, 0, 'v')

No description has been provided for this image

A glance a these posteriors visually corroborates our simple counting analysis. The two posteriors for v are clearly separated.

To appreciate the difference, let us also plot the posteriors of the respective a parameters.

In [6]:

Copied!





# Specify a plot with one row and two columns
fig, ax = plt.subplots(1, 1, figsize=(10, 5))

# Plot the posterior samples for condition 1
az.plot_posterior(
    m1.traces.posterior, var_names="a", ax=ax, color="blue", hdi_prob=0.95
)

# Plot the posterior samples for condition 2
az.plot_posterior(m2.traces.posterior, var_names="a", ax=ax, color="red", hdi_prob=0.95)

# Set x-axis limit
ax.set_xlim(1.3, 1.7)

# Create proxy artists for the legend
from matplotlib.lines import Line2D

legend_elements = [
    Line2D([0], [0], color="blue", label="Condition 1"),
    Line2D([0], [0], color="red", label="Condition 2"),
]

# Add a legend
ax.legend(handles=legend_elements)

# Add a title
ax.set_title("Posterior samples for a")

# Add a x-axis label
ax.set_xlabel("v")
# Specify a plot with one row and two columns
fig, ax = plt.subplots(1, 1, figsize=(10, 5))

# Plot the posterior samples for condition 1
az.plot_posterior(
    m1.traces.posterior, var_names="a", ax=ax, color="blue", hdi_prob=0.95
)

# Plot the posterior samples for condition 2
az.plot_posterior(m2.traces.posterior, var_names="a", ax=ax, color="red", hdi_prob=0.95)

# Set x-axis limit
ax.set_xlim(1.3, 1.7)

# Create proxy artists for the legend
from matplotlib.lines import Line2D

legend_elements = [
    Line2D([0], [0], color="blue", label="Condition 1"),
    Line2D([0], [0], color="red", label="Condition 2"),
]

# Add a legend
ax.legend(handles=legend_elements)

# Add a title
ax.set_title("Posterior samples for a")

# Add a x-axis label
ax.set_xlabel("v")

Out[6]:

Text(0.5, 0, 'v')

and correspondingly,

In [7]:

Copied!

print(
    f"p(a_1 > a_2) = {np.mean(m1.traces.posterior['a'].values > m2.traces.posterior['a'].values)}"
)
print(
    f"p(a_1 > a_2) = {np.mean(m1.traces.posterior['a'].values > m2.traces.posterior['a'].values)}"
)

p(a_1 > a_2) = 0.956

Example 2: Combined Model¶

In [8]:

Copied!

condition_1["condition"] = "C1"
condition_2["condition"] = "C2"

data = pd.concat([condition_1, condition_2]).reset_index(drop=True)
condition_1["condition"] = "C1"
condition_2["condition"] = "C2"

data = pd.concat([condition_1, condition_2]).reset_index(drop=True)

In [9]:

Copied!





m_combined = hssm.HSSM(
    model="ddm",
    data=data,
    include=[
        {
            "name": "v",
            "formula": "v ~ 1 + condition",
        }
    ],
)

idata_combined = m_combined.sample(sampler="mcmc", tune=500, draws=500)
m_combined = hssm.HSSM(
    model="ddm",
    data=data,
    include=[
        {
            "name": "v",
            "formula": "v ~ 1 + condition",
        }
    ],
)

idata_combined = m_combined.sample(sampler="mcmc", tune=500, draws=500)

Model initialized successfully.
Using default initvals.

Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [z, a, t, v_Intercept, v_condition]

Output()

Sampling 4 chains for 500 tune and 500 draw iterations (2_000 + 2_000 draws total) took 13 seconds.
/Users/afengler/Library/CloudStorage/OneDrive-Personal/proj_hssm/HSSM/.venv/lib/python3.12/site-packages/pymc/pytensorf.py:958: FutureWarning: compile_pymc was renamed to compile. Old name will be removed in a future release of PyMC
  warnings.warn(
100%|██████████| 2000/2000 [00:00<00:00, 2495.89it/s]

Note, now we don't have two distinct idata objects, but instead we have a single idata object with a posterior for v_condition. Let's take a closer look.

In [10]:

Copied!

m_combined.traces.posterior
m_combined.traces.posterior

Out[10]:

<xarray.Dataset> Size: 84kB
Dimensions:          (chain: 4, draw: 500, v_condition_dim: 1)
Coordinates:
  * chain            (chain) int64 32B 0 1 2 3
  * draw             (draw) int64 4kB 0 1 2 3 4 5 6 ... 494 495 496 497 498 499
  * v_condition_dim  (v_condition_dim) <U2 8B 'C2'
Data variables:
    v_condition      (chain, draw, v_condition_dim) float64 16kB 0.59 ... 0.5945
    z                (chain, draw) float64 16kB 0.4765 0.4883 ... 0.4708 0.4863
    v_Intercept      (chain, draw) float64 16kB 0.5697 0.5411 ... 0.643 0.5514
    t                (chain, draw) float64 16kB 0.1194 0.09772 ... 0.09983
    a                (chain, draw) float64 16kB 1.505 1.481 1.471 ... 1.453 1.51
Attributes:
    created_at:                  2025-07-13T13:15:02.703723+00:00
    arviz_version:               0.21.0
    inference_library:           pymc
    inference_library_version:   5.21.1
    sampling_time:               12.807016849517822
    tuning_steps:                500
    modeling_interface:          bambi
    modeling_interface_version:  0.15.0

xarray.Dataset

Dimensions:
- chain: 4
- draw: 500
- v_condition_dim: 1
Coordinates: (3)
- chain
  (chain)
  int64
  0 1 2 3
```
array([0, 1, 2, 3])
```
- draw
  (draw)
  int64
  0 1 2 3 4 5 ... 495 496 497 498 499
```
array([  0,   1,   2, ..., 497, 498, 499])
```
- v_condition_dim
  (v_condition_dim)
  <U2
  'C2'
```
array(['C2'], dtype='<U2')
```

Data variables: (5)

v_condition

(chain, draw, v_condition_dim)

float64

0.59 0.5431 ... 0.4411 0.5945

array([[[0.58996913],
        [0.54310084],
        [0.45786056],
        ...,
        [0.54373427],
        [0.52843865],
        [0.56105675]],

       [[0.49269421],
        [0.67498048],
        [0.60179304],
        ...,
        [0.50747982],
        [0.56202344],
        [0.55423418]],

       [[0.52921902],
        [0.51230421],
        [0.56028114],
        ...,
        [0.46929023],
        [0.58611091],
        [0.58739194]],

       [[0.47155776],
        [0.45551031],
        [0.45390417],
        ...,
        [0.59534183],
        [0.44109688],
        [0.59446789]]])

(chain, draw)

float64

0.4765 0.4883 ... 0.4708 0.4863

array([[0.47653256, 0.4882801 , 0.49362173, ..., 0.5075728 , 0.46809931,
        0.47531708],
       [0.49866914, 0.4913315 , 0.49109971, ..., 0.52445541, 0.50078215,
        0.49882961],
       [0.50385372, 0.48848397, 0.47839257, ..., 0.47773319, 0.49040009,
        0.50536152],
       [0.470211  , 0.46377294, 0.46618225, ..., 0.4776825 , 0.47081403,
        0.48633067]])

v_Intercept

(chain, draw)

float64

0.5697 0.5411 ... 0.643 0.5514

array([[0.56965465, 0.54108047, 0.57400706, ..., 0.55945787, 0.56258344,
        0.53917956],
       [0.53786787, 0.52738602, 0.53734087, ..., 0.51034373, 0.4936088 ,
        0.49177021],
       [0.54161   , 0.54580791, 0.58103289, ..., 0.62790848, 0.54988876,
        0.51985422],
       [0.63071102, 0.67291763, 0.67186688, ..., 0.5754253 , 0.64304542,
        0.55137405]])

(chain, draw)

float64

0.1194 0.09772 ... 0.1161 0.09983

array([[0.11935694, 0.09771531, 0.12830856, ..., 0.16527374, 0.09558392,
        0.09378892],
       [0.13340272, 0.13469323, 0.12514864, ..., 0.14590232, 0.14660659,
        0.16290721],
       [0.13576782, 0.1542973 , 0.08312684, ..., 0.11911145, 0.11828417,
        0.12052945],
       [0.09663085, 0.08513257, 0.0849211 , ..., 0.08140924, 0.11614302,
        0.09982991]])

(chain, draw)

float64

1.505 1.481 1.471 ... 1.453 1.51

array([[1.50494315, 1.48071879, 1.47145086, ..., 1.46957021, 1.49494389,
        1.48760759],
       [1.450766  , 1.47405897, 1.48822357, ..., 1.44528057, 1.42308357,
        1.43773663],
       [1.52309931, 1.4186507 , 1.52758663, ..., 1.43851355, 1.47470262,
        1.49653652],
       [1.50498333, 1.49831071, 1.50511883, ..., 1.52800404, 1.45271273,
        1.51018186]])

Indexes: (3)

chain

PandasIndex

PandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))

draw

PandasIndex

PandasIndex(Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
       ...
       490, 491, 492, 493, 494, 495, 496, 497, 498, 499],
      dtype='int64', name='draw', length=500))

v_condition_dim

PandasIndex

PandasIndex(Index(['C2'], dtype='object', name='v_condition_dim'))

Attributes: (8)
created_at :
2025-07-13T13:15:02.703723+00:00
arviz_version :
0.21.0
inference_library :
pymc
inference_library_version :
5.21.1
sampling_time :
12.807016849517822
tuning_steps :
500
modeling_interface :
bambi
modeling_interface_version :
0.15.0

Under the hood, Bambi created a model with a dummy variable for the C2 condition (the C1 condition represents the Intercept).

So what do we need to test here...

We still ask the same question: Is the v_condition[C2] posterior higher than the v_condition[C1] posterior?

But now we only need to check if our v_condition[C2] variable is above 0, since it represents the offset from the Intercept directly.

Let's check:

In [11]:

Copied!

print(
    f"p(v_condition[C2] > v_condition[C1]) = {np.mean(idata_combined.posterior['v_condition'].values > 0)}"
)
print(
    f"p(v_condition[C2] > v_condition[C1]) = {np.mean(idata_combined.posterior['v_condition'].values > 0)}"
)

p(v_condition[C2] > v_condition[C1]) = 1.0

or visually,

In [12]:

Copied!

az.plot_posterior(idata_combined.posterior, var_names="v_condition", hdi_prob=0.95)
az.plot_posterior(idata_combined.posterior, var_names="v_condition", hdi_prob=0.95)

Out[12]:

<Axes: title={'center': 'v_condition\nC2'}>

You can use this approach to test any number of complex statements about your parameters. There will essentially always be a way to turn your question into a simple comparison of the posterior samples, a simple counting problem.