Using K-Means with predefined centers?

Question

I'm running a KNN classifier whose feature vectors come from a K-Means classifier (more specifically, sklearn.cluster.MiniBatchKMeans). Since the K-means starts with random points every time I'm getting different results every time I run my algorithm. I've stored the cluster centers in a separate .npy file from a time where results were good, but now I need to use those centers in my K-means and I don't know how.

Following this advice, I tried to use the cluster centers as starting points like so:

MiniBatchKMeans.__init__(self, n_clusters=self.clusters, n_init=1, init=np.load('cluster_centers.npy'))

Still, results change every time the algorithm is run.

Then I tried to manually alter the cluster centers after fitting the data:

kMeansInstance.cluster_centers_ = np.load('cluster_centers.npy')

Still, different results each time.

The only other solution I can think of is manually implementing the predict method using the centers I saved, but I don't know how and I don't know if there is a better way to solve my problem than rewriting the wheel.

score 1 · Answer 1 · answered May 13 '18 at 19:04

1

I would guess fixing the random_state will do the job.

See API docu.

answered May 13 '18 at 19:04

Jan K

4,040
1
15
16

Has QUIT--Anony-Mousse · Answer 2 · 2018-07-12T16:56:09.330

0

Mini batch k-means only considers a sample of the data.

It uses a random generator for this.

If you want deterministic behaviour, fix the random seed, and prefer algorithms that do not use a random sample (i.e., use the regular k-means instead of mini-batch k-means).

edited Jul 12 '18 at 16:56

answered May 13 '18 at 23:57

Has QUIT--Anony-Mousse

76,138
12
138
194

Using K-Means with predefined centers?

2 Answers2