-2

I'm running a KNN classifier whose feature vectors come from a K-Means classifier (more specifically, sklearn.cluster.MiniBatchKMeans). Since the K-means starts with random points every time I'm getting different results every time I run my algorithm. I've stored the cluster centers in a separate .npy file from a time where results were good, but now I need to use those centers in my K-means and I don't know how.

Following this advice, I tried to use the cluster centers as starting points like so:

MiniBatchKMeans.__init__(self, n_clusters=self.clusters, n_init=1, init=np.load('cluster_centers.npy'))

Still, results change every time the algorithm is run.

Then I tried to manually alter the cluster centers after fitting the data:

kMeansInstance.cluster_centers_ = np.load('cluster_centers.npy')

Still, different results each time.

The only other solution I can think of is manually implementing the predict method using the centers I saved, but I don't know how and I don't know if there is a better way to solve my problem than rewriting the wheel.

2 Answers2

1

I would guess fixing the random_state will do the job.

See API docu.

Jan K
  • 4,040
  • 1
  • 15
  • 16
0

Mini batch k-means only considers a sample of the data.

It uses a random generator for this.

If you want deterministic behaviour, fix the random seed, and prefer algorithms that do not use a random sample (i.e., use the regular k-means instead of mini-batch k-means).

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194