ROKO

K-Means 본문

Artificial Intelligence/Machine Learning

K-Means

RO_KO 2023. 3. 16. 00:00
728x90

density modeling 중 nonparametric방식으로 latent data 없이 모두 관측 가능한 데이터셋에 대해 적용하는 기법이다.

latent variable이 있거나 never observed 되는 부분이 있는 데이터라면 latent variable models이라고 부른다.

 

Clustering

비슷한 데이터끼리 집합으로 모으는 방식으로 다른 집합끼리는 데이터끼리도 서로 다르다.

label없이 데이터의 유사도끼리 grouping 하는 걸 clustering 이라고 한다. 이때 유사도를 어떤 metric으로 결정하냐에 따라서 다양한 cluster가 생성된다.

위의 예시는 multiple modes가 있으므로 multimodal distribution이라고 부른다.

 

k-means intuition

  • If we know the cliuster assignment, then know means
  • If we know the means, then know cluster assignment
  • chicken and egg problem
  • NP hard -> heuristic : start randomly and alternate between two

Alogrithm

  1. Initialization : 클러스터 중심을 랜덤으로 초기화
    Set K cluster means \(m_1,\cdots,m_k\) to random values.
  2. Repeat two step
    • Assignment step : 각 데이터에 가까운 클러스터를 부여
      Each data point \(x^{(n)}\) assigned to nearest mean
      \(\hat{k}^n=\underset{k}{argmin} d(m_k,x^{(n)})\) with L2
      Responsbilities (1-hot encoding)
      \(\gamma^{(n)}_k=1 \leftrightarrow \hat{k}^{(n)}=k\)
    • Refitting step : 클러스터 중심을 부여받은 쪽으로 수정
      \(m_k=\frac{sum_n\gamma^{(n)}_kx^{(n)}}{sum_n \gamma^{(n)}_k}\)

K-means objective:

Find cluster centers m and assignments r to minimize the sum of L2 distance of data points to their assigned cluster centers.

\(\underset{\{m\},\{r\}}{min}J(\{m\},\{r\})=\underset{\{m\},\{r\}}{min}\sum_{n=1}^N\sum_{k=1}^Kr_k^{(n)}||m_k-x^{(n)}||^2\)

\(s.t.\sum_kr^{(n)}_k=1,\forall n,where r^{(n)}_k\in\{0,1\},\forall k,n\)

\(where r^{(n)}_k=1\) means that \(x^{(n)}\) is assigned to cluster k with center\(m_k\)

 

Optimization method is a form of coordinate descent ("block coordinate descent")

  • Fix centers, optimize assignments
  • Fix assignments, optimize means

K-means for Vector Quantization

K-means for Image segmentation

How would you modify k-means to get superpixels?

- [link]

 

Why K-means Converges

  • Whenever an assignment is changed, the sum L2 distance J of data points from their assigned cluster centers is reduced.
  • Whenenver a cluster center is moved, J is reduced.
  • If the assignments do not change in the assignment step, we have converged to at least a local minimum.
  • The object J is non-convex, so coordinate decent on J is not guaranteed to convergence of global minimum.
  • There is nothing to prevent k-means getting stuck at loacal minima.
  • We could try many random starting points.
  • Try non-local split-and-merge moves:
    • merge two nearby clusters
    • split a big cluster into two

Soft K-means

Instead of making hard assignments of data points to clusters, we can make soft assignments.

 

How?

Soft K-means Algorithm

  • Intialization: set k means \(\{m_k\}\) to random values.
  • Repeat until convergence (until assignments do not change)
    • Assignment: 각 클러스터 모두의 확률을 고려
      \(\gamma^{(n)}_k=\frac{exp[-\beta d(m_k,x^{(n)})]}{\sum_jexp[-\beta d(m_j,x^{(n)})]}\)
    • Refitting: soft 확률에 대해 coordinate descent
      \(m_k=\frac{\sum_n \gamma^{(n)}_kx^{(n)}}{\sum_n \gamma^{(n)}_k}\)

Some remaining issue

  • how to set \(\beta\)? (hyperparameter tuning)
  • what about problems with elongated clusters? (너무 옆으로 길게 늘여진 군집)
  • clusters with unequal weight and width (군집들이 중요도가 서로 다름)

-> Let's use GMM!

 

KNN vs K-means

혹시나 오해할 소지가 있어 써놓는다. KNN은 label이 있고 K-means는 label이 없는 것이다.

KNN 주변 data를 k개 고려하여 label을 voting형식으로 결정하고, K-means는 각 data의 특징을 이용해 K개의 cluster center를 구하는것이 목표이다.

728x90

'Artificial Intelligence > Machine Learning' 카테고리의 다른 글

Matrix Factorization  (0) 2023.04.18
GMM (Gaussian Mixture Model)  (0) 2023.03.18
Generative models  (0) 2023.03.15
Probabilistic models [Ⅱ]  (0) 2023.03.15
Comments