仅仅是一个拿来练手的Clojure空间K-Means算法实现
源码地址在
这里和
这里K-Means算法是一种简单但并不精确的数据分类、聚集算法
空间K-Means是对任意维度的空间点数据进行聚类的方法,大致用途就像这样:
对算法的详细介绍在
这里写的很渣,在处理1到15000内所有整数任意组合形成的7500个点的二维点阵的时候溢出了。
(defn rand-sub [coll n] (take n (shuffle coll)))
(defn distance [coll-1 coll-2]
(apply + (map (fn [x] (#(* % %) (apply - x))) (partition 2 (interleave coll-1 coll-2)))))
(defn get-center [coll]
(map #(/ % (count coll)) (let [+seq (fn [x y] (map #(apply + %) (partition 2 (interleave x y))))]
(reduce +seq coll))))
(defn find-nearest-seed [point seeds]
(->> (map #(vec [(distance % point) %]) seeds) (apply concat) (apply sorted-map) (first) (last)))
(defn seed-means [seeds coll]
(->> (group-by last (map #(vec [% (find-nearest-seed % seeds)]) coll))
(vals)
(map (partial map first))))
(defn k-means [k coll]
(loop [seed (rand-sub coll k) old-seed []]
(let [it-res (seed-means seed coll)]
(if (= seed old-seed) it-res
(recur (map get-center it-res) seed)))))
嗯,就这些