A cute and thought-provoking theorem for sure, but arguably none of the three gi...

yshklarov · 2024-12-27T00:38:29 1735259909

Actually, scale-invariance only refers to scaling all dimensions by the same scalar (this is more clearly specificed in the paper linked by the article, page 3). For arbitrary scaling on each coordinate, of course you're correct, it's impossible to have a clustering algorithm that is invariant for such transformations (e.g., the 6-point group ::: may look like either 2 or 3 clusters, depending on whether it's stretched horizontally or vertically).

As for your last two points, I believe I agree! It seems that in the counterexample you give for consistency, some notion of scale-invariance is implicitly assumed -- perhaps this connection plays some role in the theorem's proof (which I haven't read).

This reminds me a bit of Arrow's impossibility theorem for voting, which similarly has questionable premises.

hnaccount_rng · 2024-12-27T08:11:53 1735287113

But in almost all cases that doesn't make any sense? Typically the data in different dimensions will have different "units". So there isn't any meaning in the scale in the first place. How could scaling by a single scalar be "more natural"?

yshklarov · 2024-12-27T18:16:39 1735323399

If different components of the dataset have different units, I would argue that it is a prerequisite of clustering to first specify the relative importance of each particular unit (thereby putting all units on the same scale). Otherwise, there's no way the clustering algorithm could possibly know what to in certain cases (such as the ::: example).

It's true that there is no intrinsic meaning to the scale, but you must specify at least a relative scale -- how you want to compare (or weigh) different units -- before you can meaningfully cluster the data. Clustering can only work on dimensionless data.

bubblyworld · 2024-12-27T09:28:09 1735291689

This is the point I think - there's no inherent meaning to the scaling factor(s) as far as overall structure is concerned (they're dimensionless, so the units thing isn't a problem), so the outcome of a clustering algorithm should not depend on it.

amelius · 2024-12-27T09:36:05 1735292165

What if you first did PCA on your data?

bubblyworld · 2024-12-27T09:41:03 1735292463

You tell me? I'm not sure I understand the question.

amelius · 2024-12-27T10:53:38 1735296818

Basically scale and rotate your data such that the variation in all dimensions becomes equal, so to speak.

bubblyworld · 2024-12-27T12:18:03 1735301883

Ah I see. As I understand it a general linear map like that isn't what the linked paper means by "scale-invariance", so it wouldn't be considered a violation for a dataset and it's PCA to be given different clusters by your clustering algorithm. It's only the dataset and its scaled up or down counterparts (i.e. the metric is multiplied by a fixed non-zero constant) that are required to get the same clusters for scale-invariance to hold.

In fact the paper doesn't assume that your dataset is contained in a vector space at all. All you have to give a clustering algorithm (as they define it) is a set and a metric function on it.

(the paper if you don't have a link: https://www.cs.cornell.edu/home/kleinber/nips15.pdf)

nextaccountic · 2024-12-27T07:55:23 1735286123

Scale invariance is well motivated in the sense that perhaps you already normalized your data (usually to fit [-1, 1] or something) and you would be bummed to discover it fucked up your clusters

Or likewise: if you have physical data recorded in some units (say, meters), it would suck if the clusters changed if you had measured stuff in another unit instead