DS Intro

DS Clustering


Clustering is a technique used to group similar objects or data points together based on their inherent characteristics. 

Clustering is an unsupervised learning technique used to label unlabeled data. 

For example, consider the scenario of data with different car models with their CO2 emission and Price plotted on a scatter plot.

@font-face {
  font-family: Inter; src: url("Inter.woff2");

body, .usertext {
  color: #fff; 
  background: #000;
  font-family: Inter, sans;
  --heading-1: 30px/32px Helvetica, sans-serif;

If we look carefully at the scatter plot, we can easily see that there are two types of models in the plot. If we use clustering on this data it’ll separate the two clusters as shown in the image below. 

And actually, the points in these two clusters represent the petrol and electric vehicles as labelled.

Clustering is used in data analysis to look at the properties of different segments of the data. Few applications of clustering 

Customer segmentation in marketing: Grouping customers based on their characteristics and behaviours to personalise marketing strategies. 

Image compression in computer vision: Reducing the size of images while preserving important visual information. 

Anomaly detection in cybersecurity: Identifying unusual or malicious activities in network traffic or system logs. 

Document clustering in natural language processing: Organising documents into groups based on similarity to aid in information retrieval and analysis. 

Stock market analysis in finance: Analysing and clustering stocks based on their historical performance to make investment decisions