Cluster one-dimensional data optimally?

32,731

Univariate k-means clustering can be solved in O(kn) time (on already sorted input) based on theoretical results on Monge matrices, but the approach was not popular most likely due to numerical instability and also perhaps coding challenges.

A better option is an O(knlgn) method that is now implemented in Ckmeans.1d.dp version 3.4.6. This implementation is as fast as heuristic k-means but offers guaranteed optimality, orders of magnitude better than heuristic k-means especially for large k's.

The generic dynamic programming solution by Richard Bellman (1973) does not touch upon specifics of the k-means problem and the implied runtime is O(kn^3).

Share:
32,731
Laciel
Author by

Laciel

Updated on July 04, 2020

Comments

  • Laciel
    Laciel almost 4 years

    Does anyone have a paper that explains how the Ckmeans.1d.dp algorithm works?

    Or: what is the most optimal way to do k-means clustering in one-dimension?