Friday, December 21, 2018

2. Spectral Clustering - Basic Concepts

This is the second part of the multiple blogs on Spectral Clustering. Following are the links to all the parts of this series.

1.  Spectral Clustering - Intro
2.  Spectral Clustering - Basic Ideas
3.

2. Spectral Clustering  - Basic Concepts


The data can be represented in different ways. Tabular data and graph data are common ways to represent the data.

Tabular data: The data is organized into tables. Each table having a specific format arranged in fixed number of columns.  
Graph Data: Graph representation of the data involves organization of data in the form of nodes and edges. Where nodes store the information or the property while the edges represent the relationship between the nodes.

Graph

A graph consists of a set of nodes(V) and a set of edges(E) to represent the relation between the nodes.



 Fig 1. Example graph with 5 vertices and 5 edges



G = {V, E}
V = {v1, v2, v3, v4, v5, v6, …}
E = {e1, e2, e3, e4, e5, …}

Where V is a set of vertices and E is a set of edges.


Common Terminologies

The common terminologies and concepts which will be applicable in spectral clustering are,

Adjacency Matrix(W):  A graph can be represented as a two-dimensional array of size V x V, where V is a number of vertices. Both rows and columns of the adjacency are labeled by the vertices hence making it a square matrix and symmetric. The element (wij) of the adjacency matrix indicate whether the two vertices are adjacent (connected by a common edge.) or not. The example graph shown in fig 1. can be represented as an Adjacency Matrix as below.

Fig 2. Adjacency Matrix for the example graph in fig 1

Degree of a vertex(di): Degree of a vertex is the number of edges that touch the vertex. For example, in the example graph above. The degree of vertex v1 is 2. The degree of a vertex is the sum of the elements of a row or a column of an adjacency matrix corresponding to the vertex.

di = w11 + w12 + w13+ w14 …….



Fig 3. Degree of vertices of the example graph shown above.

Degree Matrix(D): A degree matrix is a diagonal matrix with the information about the degree of the vertices. The size of the matrix is also VxV.

Fig. 4 Degree Matrix


Laplacian Matrix (L): A Laplacian of a graph is given by the D – W, where D is a diagonal matrix formed from the vertex degrees and W is an adjacency matrix.

These are the basic concepts of Graph theory applicable to Spectral Clustering.
I will next write about the Spectral Clustering in the next Part of this blog series.




Thursday, December 20, 2018

1. Spectral Clustering - Intro

This is the first part of the multiple blogs on Spectral Clustering. I will write a few blogs related to basic concepts in Spectral clustering in this series. Below are the links to all the parts of this series.

1.  Spectral Clustering - Intro
2.  Spectral Clustering - Basic Ideas
3.


1. Spectral Clustering - Intro


To start with spectral clustering, let’s first see what clustering is.

Clustering is a technique used for automatically grouping the data-points into a number of clusters based on their similarity so that less similar points fall under different clusters.

Clustering is basically done by grouping the points into clusters based on some distance measure, so that the points in same cluster have a small distance from one another. However, proximity is not always the relationship between the data points. Sometimes geometrical relation is also important. Spectral clustering is based on such relationships.

How does spectral clustering fit into the world of machine learning?

Firstly, machine learning techniques are broadly classified into supervised and unsupervised (of course there is semi-supervised too, but let’s not go into that now) machine learning. To put it simply, supervised learning involves the use of labeled training data and the unsupervised learning deals with the unlabeled data.

Clustering is an unsupervised technique. Among the most popular clustering techniques, we have K-means, Gaussian Mixture Model, Expectation Maximization(EM), Spectral Clustering etc.

Yes, that where Spectral Clustering lies in the big picture. Spectral clustering is performed in Graph Data. 

In next blog, I will try to write something about the Graph Theory and Basic ideas needed before jumping into theory of Spectral clustering.