Unsupervised Learning: Unveiling Hidden Patterns in the Real World

As children, we learn from our experiences by unconsciously identifying patterns in our surroundings and applying them to new situations. In the world of artificial intelligence, this is how unsupervised learning works. Unsupervised machine learning is the process of inferring underlying hidden patterns from historical data. Within such an approach, a machine learning model tries to find any similarities, differences, patterns, and structure in data by itself. No prior human intervention is needed.

What is Unsupervised Learning?

Unsupervised learning is a type of machine learning where the algorithm is provided with input data without explicit instructions on what to do with it. The model learns patterns from data that has no labels (i.e., no predefined answers or categories). Unlike supervised learning, where a model learns to predict outputs based on the labeled dataset, meaning it already contains the examples of correct answers carefully mapped out by human supervisors.

Unsupervised learning is helpful for data science teams that don't know what they're looking for in data. It can be used to search for unknown similarities and differences in data and create corresponding groups.

Let's consider the example of a toddler who knows what the family cat looks like but has no idea that there are a lot of other cats in the world that are all different. If the kid sees another cat, he or she will still be able to recognize it as a cat through a set of features such as two ears, four legs, a tail, fur, whiskers, etc. In machine learning, this kind of prediction is called unsupervised learning.

Types of Unsupervised Learning Techniques

Unsupervised learning can be approached through different techniques such as clustering, association rules, and dimensionality reduction.

Read also: Learn BERT Embeddings

Clustering

From all unsupervised learning techniques, clustering is surely the most commonly used one. Clustering involves grouping similar data points together based on their inherent characteristics, without any predefined labels. This method groups similar data pieces into clusters that are not defined beforehand. An ML model finds any patterns, similarities, and/or differences within uncategorized data structure by itself. If any natural groups or classes exist in data, a model will be able to discover them.

To explain the clustering approach, here's a simple analogy. In a kindergarten, a teacher asks children to arrange blocks of different shapes and colors. The thing is a teacher hasn’t given the criteria on which the arrangement should be done so different children came up with different groupings. Some kids put all blocks into three clusters based on the color ‒ yellow, blue, and pink. Others categorized the same blocks based on their shape ‒ rectangular, triangular, and round. There is no right or wrong way to perform grouping as there was no task set in advance. Thanks to the flexibility as well as the variety of available types and algorithms, clustering has various real-life applications.

Types of Clustering

There is an array of clustering types that can be utilized.

Exclusive clustering: Also known as “hard” clustering, is the kind of grouping in which one piece of data can belong only to one cluster.
Overlapping clustering: Also known as “soft” clustering, allows data items to be members of more than one cluster with different degrees of belonging. Additionally, probabilistic clustering may be used to solve “soft” clustering or density estimation issues and calculate the probability or likelihood of data points belonging to specific clusters.
Hierarchical clustering: aims, as the name suggests, at creating a hierarchy of clustered data items.

Clustering Algorithms

K-means: is an algorithm for exclusive clustering, also known as partitioning or segmentation. It puts the data points into the predefined number of clusters known as K. Basically, K in the K-means algorithm is the input since you tell the algorithm the number of clusters you want to identify in your data. Each data item then gets assigned to the nearest cluster center, called centroids. The closeness is measured by the distance from a data point to the centroid of the cluster. Ideal clustering happens with a single centroid in each cluster.
Fuzzy K-means: is an extension of the K-means algorithm used to perform overlapping clustering.
Gaussian Mixture Models (GMMs): is an algorithm used in probabilistic clustering. Since the mean or variance is unknown, the models assume that there is a certain number of Gaussian distributions, each representing a separate cluster.
Hierarchical clustering: may start with each data point assigned to a separate cluster. Two clusters that are closest to one another are then merged into a single cluster. The merging goes on iteratively till there's only one cluster left at the top. Agglomerative clustering is considered a “bottoms-up approach.” Its data points are isolated as separate groupings initially, and then they are merged together iteratively on the basis of similarity until one cluster has been achieved. Divisive clustering can be defined as the opposite of agglomerative clustering; instead it takes a “top-down” approach. In this case, a single data cluster is divided based on the differences between data points. Divisive clustering is not commonly used, but it is still worth noting in the context of hierarchical clustering.
DBSCAN Clustering: (Density-based Spatial Clustering of Applications with Noise) is another approach to clustering.

Association Rule Learning

An association rule is a rule-based unsupervised learning method aimed at discovering relationships and associations between different variables in large-scale datasets. The rules present how often a certain data item occurs in datasets and how strong and weak the connections between different objects are. This technique is widely used to analyze customer purchasing habits, allowing companies to understand relationships between different products and build more effective business strategies.

For example, a coffee shop sees that there are 100 customers on Saturday evening with 50 out of 100 of them buying cappuccino. Out of 50 customers who buy cappuccino, 25 also purchase a muffin. The association rule here is: If customers buy cappuccino, they will buy muffins too, with the support value of 25/100=25% and the confidence value of 25/50=50%. The support value indicates the popularity of a certain itemset in the whole dataset.

Algorithms for Association Rule Learning

The apriori algorithm utilizes frequent itemsets to create association rules. Frequent itemsets are the items with a greater value of support. The algorithm generates the itemsets and finds associations by performing multiple scanning of the full dataset.
Just like apriori, the frequent pattern growth algorithm also generates the frequent itemsets and mines association rules, but it doesn't go through the complete dataset several times.

Dimensionality Reduction

Dimensionality reduction is another type of unsupervised learning pulling a set of methods to reduce the number of features - or dimensions - in a dataset.

When preparing your dataset for machine learning, it may be quite tempting to include as much data as possible. Don’t get us wrong, this approach works well as in most cases more data means more accurate results. That said, imagine that data resides in the N-dimensional space with each feature representing a separate dimension. A lot of data means there may be hundreds of dimensions. Think of Excel spreadsheets with columns serving as features and rows as data points. Sometimes, the number of dimensions gets too high, resulting in the performance reduction of ML algorithms and data visualization hindering. So, it makes sense to reduce the number of features - or dimensions - and include only relevant data. That’s what dimensionality reduction is.

The dimensionality reduction technique can be applied during the stage of data preparation for supervised machine learning. With it, it is possible to get rid of redundant and junk data, leaving those items that are the most relevant for a project.

Say, you work in a hotel and you need to predict customer demand for different types of hotel rooms. There’s a large dataset with customer demographics and information on how many times each customer booked a particular hotel room last year. The thing is, some of this information may be useless for your prediction, while some data has quite a lot of overlap and there's no need to consider it individually. Take a closer look and you'll see that all customers come from the US, meaning that this feature has zero variance and can be removed. Since room service breakfast is offered with all room types, the feature also won't make much impact on your prediction. Features like "age" and "date of birth" can be merged as they are basically duplicates.

Algorithms for Dimensionality Reduction

Principal component analysis (PCA) is an algorithm applied for dimensionality reduction purposes. It’s used to reduce the number of features within large datasets, which leads to the increased simplicity of data without the loss of its accuracy. Dataset compression happens through the process called feature extraction. It means that features within the original set are combined into a new, smaller one. Such new features are known as principal components. In its core, PCA is a linear feature extraction tool. As a visualization tool - PCA is useful for showing a bird’s eye view on the operation. The first principal component is the direction which maximizes the variance of the dataset. While the second principal component also finds the maximum variance in the data, it is completely uncorrelated to the first principal component, yielding a direction that is perpendicular, or orthogonal, to the first component.
Singular value decomposition (SVD) is another dimensionality reduction approach which factorizes a matrix, A, into three, low-rank matrices. SVD is denoted by the formula, A = USVT, where U and V are orthogonal matrices. S is a diagonal matrix, and S values are considered singular values of matrix A.
t-SNE uses dimensionality reduction to translate high-dimensional data into low-dimensional space. In other words, show the cream of the crop of the dataset. As such, t-SNE is good for visualizing more complex types of data with many moving parts and everchanging characteristics. It is an algorithm that highlights the significant features of the information in the dataset and puts them front and center for further operation. In a nutshell, it sharpens the edges and turns the rounds into tightly fitting squares.
Autoencoders leverage neural networks to compress data and then recreate a new representation of the original data’s input. The hidden layer specifically acts as a bottleneck to compress the input layer prior to reconstructing within the output layer.

Real-World Applications of Unsupervised Learning

Unsupervised learning is attractive in lots of ways: starting with the opportunities to discover useful insights in data all the way to the elimination of expensive data labeling processes. Machine learning techniques have become a common method to improve a product user experience and to test systems for quality assurance. Unsupervised learning provides an exploratory path to view data, allowing businesses to identify patterns in large volumes of data more quickly when compared to manual observation.

Read also: Technology Trends Explored

Anomaly Detection

With clustering, it is possible to detect any sort of outliers in data. For example, companies engaged in transportation and logistics may use anomaly detection to identify logistical obstacles or expose defective mechanical parts (predictive maintenance). Financial organizations may utilize the technique to spot fraudulent transactions and react promptly, which ultimately can save lots of money.

Customer and Market Segmentation

Clustering algorithms can help group people that have similar traits and create customer personas for more efficient marketing and targeting campaigns. Leveraging unsupervised machine learning techniques in the Python programming language, businesses engage in the sophisticated practice of customer segmentation. This involves the identification of distinct levels and patterns within market segmentation, allowing companies to better understand their customer base. Through the analysis of user behavior, preferences, and other relevant data, businesses gain valuable insights into the diverse segments within their customer pool. This segmentation approach is not only applied to market segmentation but also extends its utility to product management segment. By effectively categorizing and understanding different user segments, product managers can tailor their approaches, ensuring that products are developed and marketed to meet the specific needs and preferences of each identified segment.

Recommender Systems

The association rules method is widely used to analyze buyer baskets and detect cross-category purchase correlations. A great example is Amazon’s “Frequently bought together” recommendations. Say, if you decide to buy Dove body wash products on Amazon, you'll probably be offered to add some toothpaste and a set of toothbrushes to your cart because the algorithm calculated that these products are often purchased together by other customers.

Various recommendation systems leverage machine learning techniques to enhance user experience across diverse platforms. For instance, the Ted Talks Recommendation System employs machine learning algorithms to analyze user preferences and suggest relevant talks that align with individual interests. Similarly, Python is frequently utilized for implementing recommendation systems in the entertainment domain, such as movie recommender systems. These systems leverage machine learning to understand user preferences, providing tailored movie suggestions for an enhanced viewing experience. Another innovative application involves movie recommendations based on emotion in Python, where algorithms gauge emotional responses to films, offering recommendations that align with the viewer's mood. Additionally, machine learning is harnessed to create sophisticated Music Recommendation Systems. These systems analyze listening patterns, user preferences, and genre affinities to curate playlists or suggest tracks that resonate with individual tastes. Examples of this can be seen in Amazon’s “Customers Who Bought This Item Also Bought” or Spotify’s "Discover Weekly" playlist.

Target Marketing

Whatever the industry, the method of association rules can be used to extract rules to help build more effective target marketing strategies. For instance, a travel agency may use customer demographic information as well as historical data about previous campaigns to decide on the groups of clients they should target for their new marketing campaign.

Thanks to the use of association rules, travel and tourism researchers managed to single out sets of travel activity combinations that particular groups of tourists are likely to be involved in based on their nationality.

Other Applications

Clinical cancer studies:
Google News: uses unsupervised learning to categorize articles on the same story from various online news outlets. For example, the results of a presidential election could be categorized under their label for “US” news.
Computer vision: Unsupervised learning algorithms are used for visual perception tasks, such as object recognition.
Medical imaging: Unsupervised machine learning provides essential features to medical imaging devices, such as image detection, classification and segmentation, used in radiology and pathology to diagnose patients quickly and accurately.
Customer personas: Defining customer personas makes it easier to understand common traits and business clients' purchasing habits. Unsupervised learning allows businesses to build better buyer persona profiles, enabling organizations to align their product messaging more appropriately.

Supervised Learning vs. Unsupervised Learning

Supervised learning and unsupervised learning are frequently discussed together. Unlike unsupervised learning algorithms, supervised learning algorithms use labeled data. While supervised learning algorithms tend to be more accurate than unsupervised learning models, they require upfront human intervention to label the data appropriately. However, these labelled datasets allow supervised learning algorithms to avoid computational complexity as they don’t need a large training set to produce intended outcomes. Semi-supervised learning occurs when only part of the given input data has been labelled.

Examples of Supervised Learning:

Credit card fraud detection: is a crucial application of machine learning in the financial sector. The goal is to build models that can automatically identify and flag transactions that are likely to be fraudulent, helping financial institutions and credit card companies prevent or minimize losses due to fraudulent activities.
Image classification: Images are labeled with the objects they contain (e.g., "cat", "dog", "car"), it forms the basis of a supervised learning problem in computer vision. Supervised learning involves training a model on a labeled dataset, where each input (in this case, an image) is associated with a corresponding output label (the object in the image).
Weather forecasting: Historical weather data is labeled with the corresponding weather conditions (e.g., "sunny", "rainy", "snowy"). The model learns to predict future weather conditions based on current and historical data.
Heart disease prediction: involves building a model that can assess the likelihood of an individual having heart disease based on various health-related features.
Cryptocurrency Prediction: predicting the future prices or trends of cryptocurrencies based on historical market data and other relevant factors. In this predictive task, historical data, including past cryptocurrency prices, trading volumes, and market indicators, serves as the training ground for the model. The model learns patterns and relationships within the data to make informed predictions about future price movements.
Stock Price Prediction: Forecasting the future prices of stocks by analyzing historical stock market data, company performance metrics, and economic indicators. Historical stock market data, encompassing factors such as past stock prices, trading volumes, and price movements, serves as the foundation for training machine learning models.
Analysing selling price of for cars: Predicting the selling price of cars based on features like brand, model, age, mileage, and additional attributes.

tags: #unsupervised #learning #examples #real #world