I was following this nice demo on updating from posteriors. It provides the following function to reconstruct a posterior:
def from_posterior(param, samples):
smin, smax = np.min(samples), np.max(samples)
width = smax - smin
x = np.linspace(smin, smax, 100)
kde = stats.gaussian_kde(samples)
y = kde(x)
# what was never sampled should have a small probability but not 0,
# so we'll extend the domain and use linear approximation of density on it
x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
y = np.concatenate([[0], y, [0]])
return Interpolated(param, x, y)
However, I'm trying to reconstruct a multidimensional distribution. So not alpha = Normal("alpha", mu=0, sigma=1)
but alpha = Normal("alpha", mu=0, sigma=1, size=(100))
. Then the above function does not work. I tried to extend in two ways.
First I tried the neat way, but I was not able to get the data in the right format from gaussian_kde
.
samples = trace["alpha"]
print("samples", samples.shape)
smin, smax = samples.min(axis=0), samples.max(axis=0)
width = smax - smin
print("width", width.shape)
x = np.linspace(smin, smax, 98, axis=1)
print("x", x.shape)
kde = stats.gaussian_kde(samples.T)
y = kde.evaluate(x)
print("y", y.shape)
Then I tried the loopy way, but this fails as well because x
is not increasing in all axis.
def from_posterior(samples):
# print(samples.shape)
smin, smax = np.min(samples), np.max(samples)
width = smax - smin
x = np.linspace(smin, smax, 98)
# print(x.shape)
kde = stats.gaussian_kde(samples)
y = kde(x)
# print(y.shape)
# what was never sampled should have a small probability but not 0,
# so we'll extend the domain and use linear approximation of density on it
x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
y = np.concatenate([[0], y, [0]])
return x, y
def from_posterior_in_loop(param, samples):
all_x, all_y = [], []
for s in range(samples.shape[1]):
x, y = from_posterior(samples[:, s])
all_x.append(x)
all_y.append(y)
return Interpolated(param, np.concatenate(all_x), np.concatenate(all_y))
with pm.Model() as model:
from_posterior_in_loop(alpha, trace["alpha"]):
Ideas are welcome. I don't necessarily need to reconstruct the posterior from a trace, if there is another method to iteratively fit a big dataset?