Bayesian Estimation Superceeds the t Test (BEST)
The BEST model (Kruschke, 2014) is a Bayesian model used to estimate the differences between two (or more) groups, with respect to one (or more) variables. We’ll apply this model to the well known iris classification dataset.
from sklearn.datasets import load_iris
import pandas as pd
from bayesian_models import BEST
# Collect the data
X, y = load_iris(return_X_y=True, as_frame=True)
names = load_iris().target_names
y = y.replace({
i:names[i] for i in range(len(names))
}).to_frame()
df = pd.concat([X,y], axis=1)
df
sepal length (cm) sepal width (cm) ... petal width (cm) target
0 5.1 3.5 ... 0.2 setosa
1 4.9 3.0 ... 0.2 setosa
2 4.7 3.2 ... 0.2 setosa
3 4.6 3.1 ... 0.2 setosa
4 5.0 3.6 ... 0.2 setosa
.. ... ... ... ... ...
145 6.7 3.0 ... 2.3 virginica
146 6.3 2.5 ... 1.9 virginica
147 6.5 3.0 ... 2.0 virginica
148 6.2 3.4 ... 2.3 virginica
149 5.9 3.0 ... 1.8 virginica
# Initialize, supply the data and variable to group by
obj = BEST()(df, 'target')
# Perform inference
obj.fit()
# Deduce the differences
obj.predict()['Δμ']
Δμ(setosa, versicolor) sepal length (cm) -0.930 0.196 ... 1.0 Indeterminate
sepal width (cm) 0.653 0.199 ... 1.0 Indeterminate
petal length (cm) -2.800 0.192 ... 1.0 Indeterminate
petal width (cm) -1.076 0.182 ... 1.0 Indeterminate
Δμ(setosa, virginica) sepal length (cm) -1.584 0.196 ... 1.0 Indeterminate
sepal width (cm) 0.454 0.199 ... 1.0 Indeterminate
petal length (cm) -4.088 0.198 ... 1.0 Indeterminate
petal width (cm) -1.778 0.182 ... 1.0 Indeterminate
Δμ(versicolor, virginica) sepal length (cm) -0.654 0.202 ... 1.0 Indeterminate
sepal width (cm) -0.199 0.198 ... 1.0 Indeterminate
petal length (cm) -1.288 0.203 ... 1.0 Indeterminate
petal width (cm) -0.702 0.194 ... 1.0 Indeterminate