Overview
Hierarchical Clustering is a clustering technique used in data analysis and machine learning. It is a type of unsupervised learning method that organizes data points into a hierarchical structure of clusters. The primary goal of hierarchical clustering is to group similar data points together based on their similarities, forming a tree-like structure known as a dendrogram.
Hierarchical clustering is one of the type of clustering. It divides the data points into a hierarchy of clusters. It can be divided into two types- Agglomerative and Divisive clustering.
There are two types of hierarchical clustering. Those types are Agglomerative and Divisive.
- The Agglomerative type will make each of the data a cluster. After that, those clusters merge as the hierarchy level goes up. This type is known as the ‘bottom-up’ approach.
- The Divisive type will group all of the data as a cluster. After that, the model splits the data as it goes down the hierarchy. This type is known as the ‘top-down’ approach.
Linkage
In hierarchical clustering, we do not only measure the distance between the data.
Instead, we need to measure the distance between two clusters. This measurement is known as linkage.
There are several linkage methods exist, such as complete linkage, average linkage, and ward linkage.
- The single linkage defines the distance with the minimum distance of two points in the two clusters.
- The complete linkage defines the distance as the maximum distance between two points in the two clusters.
- The average linkage calculates the average distance between all pairs of data in the two clusters
- Ward linkage will minimize the variance of the distances between all possible pairs of those clusters

Dendrogram
To visualize relationships between clusters, we can use a diagram called a dendrogram. What is a dendrogram?
The dendrogram is a tree-like chart to represent the hierarchical structure of data. It consists of leaves and branches.
In hierarchical clustering, leaves are the data points, and branches represent the clusters.
From the branches, we can see the relationship between data points and how similar each of them is based on their features.

Pros and Cons
Pros
- There is no need to pre-specify the number of clusters. Instead, the dendrogram can be cut at the appropriate level to obtain the desired number of clusters.
- Data is easily summarized/organized into a hierarchy using dendrograms. Dendrograms make it easy to examine and interpret clusters.
Cons
Hierarchical Clustering does not work well on vast amounts of data.
· Does not work very well with missing data
·Algorithm can never undo what was done previously.
· Time complexity of at least O(n2 log n) is required, where ’n’ is the number of data points.
·Based on the type of distance matrix chosen for merging different algorithms can suffer with one or more of the following:
i) Sensitivity to noise and outliers
ii) Breaking large clusters
iii) Difficulty handling different sized clusters and convex shapes
Python
import numpy as np import matplotlib.pyplot as plt import pandas as pd dataset = pd.read_csv('Mall_Customers.csv') X = dataset.iloc[:, [3, 4]].values #Using the dendrogram to find the optimal number of clusters import scipy.cluster.hierarchy as sch dendrogram = sch.dendrogram(sch.linkage(X, method = 'ward')) plt.title('Dendrogram') plt.xlabel('Customers') plt.ylabel('Euclidean distances') plt.show() #Training the Hierarchical Clustering model on the dataset from sklearn.cluster import AgglomerativeClustering hc = AgglomerativeClustering(n_clusters = 5, affinity = 'euclidean', linkage = 'ward') y_hc = hc.fit_predict(X) #Visualising the clusters plt.scatter(X[y_hc == 0, 0], X[y_hc == 0, 1], s = 100, c = 'red', label = 'Cluster 1') plt.scatter(X[y_hc == 1, 0], X[y_hc == 1, 1], s = 100, c = 'blue', label = 'Cluster 2') plt.scatter(X[y_hc == 2, 0], X[y_hc == 2, 1], s = 100, c = 'green', label = 'Cluster 3') plt.scatter(X[y_hc == 3, 0], X[y_hc == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4') plt.scatter(X[y_hc == 4, 0], X[y_hc == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5') plt.title('Clusters of customers') plt.xlabel('Annual Income (k$)') plt.ylabel('Spending Score (1-100)') plt.legend() plt.show()


