K-Means

Notice

Recent Posts

Recent Comments

Today

Total

250x250

관리 메뉴

ROKO

K-Means 본문

Machine Learning

K-Means

RO_KO 2023. 3. 16. 00:00

728x90

density modeling 중 nonparametric방식으로 latent data 없이 모두 관측 가능한 데이터셋에 대해 적용하는 기법이다.

latent variable이 있거나 never observed 되는 부분이 있는 데이터라면 latent variable models이라고 부른다.

Clustering

비슷한 데이터끼리 집합으로 모으는 방식으로 다른 집합끼리는 데이터끼리도 서로 다르다.

label없이 데이터의 유사도끼리 grouping 하는 걸 clustering 이라고 한다. 이때 유사도를 어떤 metric으로 결정하냐에 따라서 다양한 cluster가 생성된다.

위의 예시는 multiple modes가 있으므로 multimodal distribution이라고 부른다.

k-means intuition

If we know the cliuster assignment, then know means
If we know the means, then know cluster assignment
chicken and egg problem
NP hard -> heuristic : start randomly and alternate between two

Alogrithm

Initialization : 클러스터 중심을 랜덤으로 초기화
Set K cluster means \(m_1,\cdots,m_k\) to random values.
Repeat two step
- Assignment step : 각 데이터에 가까운 클러스터를 부여
  Each data point \(x^{(n)}\) assigned to nearest mean
  \(\hat{k}^n=\underset{k}{argmin} d(m_k,x^{(n)})\) with L2
  Responsbilities (1-hot encoding)
  \(\gamma^{(n)}_k=1 \leftrightarrow \hat{k}^{(n)}=k\)
- Refitting step : 클러스터 중심을 부여받은 쪽으로 수정
  \(m_k=\frac{sum_n\gamma^{(n)}_kx^{(n)}}{sum_n \gamma^{(n)}_k}\)

K-means objective:

Find cluster centers m and assignments r to minimize the sum of L2 distance of data points to their assigned cluster centers.

\(\underset{\{m\},\{r\}}{min}J(\{m\},\{r\})=\underset{\{m\},\{r\}}{min}\sum_{n=1}^N\sum_{k=1}^Kr_k^{(n)}||m_k-x^{(n)}||^2\)

\(s.t.\sum_kr^{(n)}_k=1,\forall n,where r^{(n)}_k\in\{0,1\},\forall k,n\)

\(where r^{(n)}_k=1\) means that \(x^{(n)}\) is assigned to cluster k with center\(m_k\)

Optimization method is a form of coordinate descent ("block coordinate descent")

Fix centers, optimize assignments
Fix assignments, optimize means

K-means for Vector Quantization

K-means for Image segmentation

How would you modify k-means to get superpixels?

- [link]

Why K-means Converges

Whenever an assignment is changed, the sum L2 distance J of data points from their assigned cluster centers is reduced.
Whenenver a cluster center is moved, J is reduced.
If the assignments do not change in the assignment step, we have converged to at least a local minimum.
The object J is non-convex, so coordinate decent on J is not guaranteed to convergence of global minimum.
There is nothing to prevent k-means getting stuck at loacal minima.
We could try many random starting points.
Try non-local split-and-merge moves:
- merge two nearby clusters
- split a big cluster into two

Soft K-means

Instead of making hard assignments of data points to clusters, we can make soft assignments.

How?

Soft K-means Algorithm

Intialization: set k means \(\{m_k\}\) to random values.
Repeat until convergence (until assignments do not change)
- Assignment: 각 클러스터 모두의 확률을 고려
  \(\gamma^{(n)}_k=\frac{exp[-\beta d(m_k,x^{(n)})]}{\sum_jexp[-\beta d(m_j,x^{(n)})]}\)
- Refitting: soft 확률에 대해 coordinate descent
  \(m_k=\frac{\sum_n \gamma^{(n)}_kx^{(n)}}{\sum_n \gamma^{(n)}_k}\)

Some remaining issue

how to set \(\beta\)? (hyperparameter tuning)
what about problems with elongated clusters? (너무 옆으로 길게 늘여진 군집)
clusters with unequal weight and width (군집들이 중요도가 서로 다름)

-> Let's use GMM!

KNN vs K-means

혹시나 오해할 소지가 있어 써놓는다. KNN은 label이 있고 K-means는 label이 없는 것이다.

KNN 주변 data를 k개 고려하여 label을 voting형식으로 결정하고, K-means는 각 data의 특징을 이용해 K개의 cluster center를 구하는것이 목표이다.

728x90

'Machine Learning' 카테고리의 다른 글

Linear algebra for ML (0)	2023.03.26
GMM (Gaussian Mixture Model) (0)	2023.03.18
Generative models (0)	2023.03.15
Probabilistic models [Ⅱ] (0)	2023.03.15

'Machine Learning' Related Articles

Comments

ROKO

K-Means 본문

K-Means

Clustering

k-means intuition

Alogrithm

K-means objective:

K-means for Vector Quantization

K-means for Image segmentation

Why K-means Converges

Soft K-means

Soft K-means Algorithm

Some remaining issue

KNN vs K-means

'Machine Learning' 카테고리의 다른 글

티스토리툴바