% Tony Hyun Kim % CS 229, PS#3, Problem 6 % Implementation of k-means clustering %------------------------------------------------------------ % Format: % examples (n by m): % examples(:,i) represents the i-th example % us (n by k): % us(:,i) represents the i-th mean %------------------------------------------------------------ function [us J iter] = my_kmeans(examples,k) [~, m] = size(examples); % Initialize us to k-randomly chosen examples p = randperm(m); p = p(1:k); us = examples(:,p); % Vector that keeps track of which centroid has % been assigned to each example cs = zeros(1,m); % Our criterion for convergence will be that the % distortion function J changes by less than tol % OR we reach a maximum number of iterations iter = 0; maxIter = 1000; tol = 1e-6; Jold = 1; J = 0; % We continue the optimization as long as the distortion function % J is improved by more than tol. As per the instructions, we % also enforce a minimum of 30 iterations while( ((-(J-Jold)>tol) && (iter>30))... || (iter>maxIter) ) % Safety brake % 1. Assign examples to the nearest centroid %------------------------------------------------------------ for i = 1:m % Fix an example... temp = zeros(1,k); for j = 1:k % Compute distance to j-th centroid temp(j) = norm(examples(:,i)-us(:,j)); end [~, minj] = min(temp); cs(i) = minj; end % 2. Update centroids %------------------------------------------------------------ for j = 1:k % Fix a centroid sieve = logical(cs==j); % Subset of examples belonging % to j-th centroid us(:,j) = mean(examples(:,sieve),2); end % 3. Update loop conditions %------------------------------------------------------------ iter = iter + 1; Jold = J; J = 0; for i = 1:m J = J + norm(examples(:,i)-us(:,cs(i)))^2; end end