How to use silhouette score in k-means clustering from sklearn library?
45,028
I am assuming you are going to silhouette score to get the optimal no. of clusters.
First declare a seperate object of KMeans
and then call it's fit_predict
functions over your data df
like this
for n_clusters in range_n_clusters:
clusterer = KMeans(n_clusters=n_clusters)
preds = clusterer.fit_predict(df)
centers = clusterer.cluster_centers_
score = silhouette_score(df, preds)
print("For n_clusters = {}, silhouette score is {})".format(n_clusters, score))
See this official example for more clarity.
Author by
Jessica Martini
Updated on January 30, 2020Comments
-
Jessica Martini over 4 years
I'd like to use silhouette score in my script, to automatically compute number of clusters in k-means clustering from sklearn.
import numpy as np import pandas as pd import csv from sklearn.cluster import KMeans from sklearn.metrics import silhouette_score filename = "CSV_BIG.csv" # Read the CSV file with the Pandas lib. path_dir = ".\\" dataframe = pd.read_csv(path_dir + filename, encoding = "utf-8", sep = ';' ) # "ISO-8859-1") df = dataframe.copy(deep=True) #Use silhouette score range_n_clusters = list (range(2,10)) print ("Number of clusters from 2 to 9: \n", range_n_clusters) for n_clusters in range_n_clusters: clusterer = KMeans (n_clusters=n_clusters).fit(?) preds = clusterer.predict(?) centers = clusterer.cluster_centers_ score = silhouette_score (?, preds, metric='euclidean') print ("For n_clusters = {}, silhouette score is {})".format(n_clusters, score)
Someone can help me with question marks? I don't understand what to put instead of question marks. I have taken the code from an example. The commented part is the previous versione, where I do k-means clustering with a fixed number of clusters set to 4. The code in this way is correct, but in my project I need to automatically chose the number of clusters.