What Is Cluster Analysis?

What is ClusteringCluster analysis is the process of grouping similar variables into groups within the application of business analytics and data mining. You’d plot a data set on an axis and then visually map it into smaller groups based on the correspondences between them.

Retail clustering groups data and transforms it into information which you can use and understand. This allows you as a retailer or supplier to implement any insights generated to improve and optimise your business processes. This practice aims to retrieve information in the fastest manner for the discovery of knowledge and unidentified patterns.

Anomaly detection, data classification and cluster analysis are typical tasks conducted in the data mining process.

Clustering can also be referred to as data segmentation because this process partitions the data points into homogeneous groups.

To achieve meaningful results, you should use a clustering algorithm tailored to the market environment. This means that the algorithm must be able to compute the amount and type of data in a time-efficient manner to produce accurate results.

As a retailer or supplier, you can use clustering to understand who your customer is and what drives their purchase decisions. This can help you to tailor your product offering and marketing strategies to your target market.

Cluster Analysis Applications and Benefits

With a retail environment, cluster analysis has many applications and benefits.

Firstly, you can use it for customer segmentation. This allows you to group consumers according to similarities, demographics and purchase behaviour.

You can use different techniques to create clusters of consumers who can be classified differently according to demographic variables, needs, wants and purchasing patterns.

You can also use store and category-related attributes such as geographic location and available shelf space to cluster a product category. Once you create your clusters, you can better describe, understand and target your main shopper segments per cluster as well as develop category-level strategies to ensure a profitable outcome for your business.

When implementing cluster analysis in business, it’s critical to understand the difference between standardisation and localisation.

A standardised strategy focuses on a single-assortment, mass-market approach while a localised strategy focuses on a store-specific approach. A clustered approach is the best mix between the two strategies as this allows you to understand your customers and the financial drivers along with effective resource management to create a profitable result.


Clustering Algorithms

Partition based Clustering

The clustering model or algorithm you select should be capable of processing large datasets efficiently, effectively and timeously.

The most commonly used clustering algorithms for retail applications are:

-   Partition-based clustering; and
-   Hierarchical clustering.

After you select your clustering model, you should analyse the category performance according to your Fact (sales), Market, Product and Period data. You can obtain this information through point of sales (POS), loyalty and market data.

The clustering algorithm you chose must ensure that the data points within a cluster are similar while the data points in different clusters are dissimilar.

Partition-based Clustering

  • This clustering technique divides data points into subsets so that each falls within a cluster. Each cluster is represented by a centroid or central point in the middle of the cluster.

  • K-means clustering

  • K-means clustering is an example of a partitioning algorithm that is most commonly used because it is simple, efficient and flexible. When using this algorithm, you will need to select the number of clusters ‘K’ that you would like to use.
  • In a retail environment, you can use either the Elbow method or industry knowledge to determine the optimal number of clusters for your product category.
-   The Elbow Method

  • You can calculate the number of clusters by minimising the within-cluster sum of squares (WCSS). Data points are plotted on an axis where the number of clusters are represented on the X-axis while the WCSS for each cluster is represented on the Y-axis.
  • As the number of clusters increases, the WCSS decreases. The rate of decreasing WCSS is steep to begin with. When the rate begins to slow down, this is shown by an ‘elbow’ or curve in the plot. The number of clusters at the elbow in the plot represents the optimum number of clusters for the dataset.
  • For retail application, this number should be considered against industry-level knowledge of the market and your business requirements for you to make a final decision.

    The elbow method-new
    When the cluster analysis runs, each data point is allocated to the nearest centroid. The algorithm will continue to assign data points to clusters until an average is reached. 

    Hierarchical clusteringHierarchical clustering
  • This clustering model groups the clusters into a hierarchical tree to form a graph called a dendrogram.
  • When using this algorithm, you can select the number of clusters you would like to use based on the Elbow method or industry knowledge. Hierarchical clustering is the algorithm to use when you use a random dataset. This is because it produces clusters in a dendrogram that are easier to interpret than k-means clusters.
  • As the numbers of clusters increases, the accuracy of the hierarchical clustering algorithm improves compared to the K-means algorithm which becomes less accurate as the number of clusters increases.

  • -   Agglomerative

  • Agglomerative clustering is a bottom-up method that begins where each data point begins in an initial cluster and then merges into other clusters as they move up the hierarchy.

  • -   Divisive
  • Divisive clustering is a top-down approach that begins with one initial cluster that is divided into groups as the data points move down the hierarchy. 

Clustering Methods

  • Clustering Methods
  • You can use different methods and variables to create clusters that are derived from data, reports, spreadsheets and speciality statistical analysis software.
  • When considering which method to use for your business, it is important to consider your access to resources such as clean retail data, information technology (IT), marketing managers and buyers.
  • It is also important to consider integration and implementation of the cluster analysis. Once you receive the results, this information must be accessible to all business functions so that it can be used and implemented.
  • Before you select your clustering technique, you must understand the principles of clustering:

    • -   Cohesion: Stores/categories within the same cluster must be as similar as possible 
    •     in terms of consumer behaviour.

    • -   Separation: Clusters must be as far apart as possible in terms of consumer  
    •     behaviour.

    • -   Population: The maximum number of stores within each cluster for a particular 
    •     product category must be grouped.
    • The clustering technique selected must reflect the variables that are most important to you as well as the strategic objectives of your organisation. For example, if are a retailer and you would like to focus on creating customer-centric product ranges, a product assortment-focused clustering technique would be most beneficial. 
  • Below are a few clustering methods worth considering:
    • 1. Single Assortment: Each store branch/category will receive the exact same product     assortment

    • 2. Channel-Based Clusters: Each sales channel type (e.g. brick and mortar store,           online store etc.) will be within the same cluster and receive the same product               assortment

    • 3. Sales Volume-Based Clusters: Stores/categories are clustered based on historical      and forecasted sales volume for a specified time period

    • 4. Store Capacity-Based Clusters: Stores/categories are grouped together based on       their available shelf and floor space. These figures may be measured in available           floor space m², shelf space (length x height x depth), SKU count or other detailed           space planning metrics.

    • 5. Sales Volume & Store Capacity-Based Clusters: Stores/categories are clustered       based on a combination of historical and forecasted data as well as capacity                  (available floor and shelf space)

    • 6. Climate-Based Clusters: Stores/ categories are grouped together based on                 seasonal weather patterns

    • 7. Store Type-Based Clusters: Store branches are grouped together based on the           store format (Hypermarket, grocery store, convenience store, speciality store etc.)
          Sales outlets are grouped based on a salient characteristic of their local market

    • 8. Competition-Based Clusters: Store branches/categories are grouped based on the     presence and intensity of competition within the same market

    • 9. Demographics-Based Clusters: Store branches/categories are grouped based on       statistical demographic data about the target market.

    • 10. Product Attribute-Based Clusters: Stores/categories are clustered based on               historical sales data of the product assortment 
  • main banner-data cubes-02-02


Cluster Analysis Data

Before conducting a cluster analysis, it’s best to select an algorithm and method that is in line with the organisational goals of your business as well as the availability of clean data and clustering software.

Some clustering algorithms are simple and require only one data type (e.g. POS data) or variable (e.g. sales). However, other algorithms are more advanced and require multiple data types (e.g. POS and loyalty data) or variables (sales and demographics).

Pos data Icon - clustering

POS Data

    • Point of sale data is referred to as POS data. You can collect this data at your till point where transactions occurs. It is the type of data that is created and you can store directly from the retail POS system, which is comprised of software and hardware.
    • POS data can provide you with information about sales and units movement as well as the average retail selling price for specific products.

      Store-related factors such as store code and store name may also be important to use for cluster analysis. If you only have access to POS data, you can use it to cluster products according to sales and units movement as well as the average retail selling price.


  • Loyality dataLoyalty Data
  • Loyalty is collected from your customers when they use their loyalty card at a point of sale. This data allows you to collect demographic information about your consumers who provide it when they sign up for the loyalty scheme.
    • You can use this information along with your POS data and shopper basket data to profile and segment consumers based on their demographics and purchasing patterns. Shopper basket data allows you to understand which consumers buy which products, and what products are frequently purchased together.
    • With such data, your buyers can develop a product offering that satisfies the wants and needs of shoppers. It can also help you with as well as with product bundling. You can use this type of data for almost all types of clustering. That means you can select which attributes are most important to you and your business. 

market data

Market Data
  • You would typically use third-party market data providers to understand market conditions. You can use this type of data to cluster stores according to market conditions and competitor action.


Store-Based Vs Category-Based Clustering

Store-based vs Category-based - NEW

Store-based and category-based clustering are the predominant methods used by the retail sector to create customer-centric merchandising and product assortment tactics.

Originally, retailers adopted a store-based approach using top-down attributes such as store size, sales figures and geographic location to boost your operational efficiency. Store-based clustering can also be referred to as ‘store grouping’. This method is simple to understand and implement across a retail business. Entire stores are grouped based on similarities among them such as LSM, size, store format and performance data. This method may work well for retailers with a few store branches with distinct characteristics. For example, a retailer that has convenience stores, grocery stores and hypermarkets may choose to group their stores according to store format as the customer base for each will be relatively consistent within each format.

However, this method does not consider the different categories within a store, which are approached differently due to the different customer purchasing patterns, wants and needs of each.

As you can see in the image below, eight store branches have been grouped into three clusters. Within each cluster, the categories and product ranges will be exactly the same. This is because stores within the same cluster are said to serve the same consumer market.


Example of store-based clustering

Today, retailers have moved towards a category-based approach to clustering which uses data across all store branches to cluster stores based on similarities in chosen variables. This means that each store may fall within a different cluster for each product category.

By using this method, you can create customer-focused assortment plans aimed at satisfying the needs of the target market. Category-based clustering is a more complicated method which takes more time and effort to implement. However, this will allow you to cater to the different customer markets that shop at various store branches, resulting in increased customer satisfaction and loyalty.

In the image below, you can see that eight stores have been clustered by category. This means that store 1, 3, 4, 5 and 7 will receive the same product range for hair care. This is because these stores are similar in terms performance data, target market, LSM etc. for the hair care category.

pasted image 0

Example of category-based clustering


Cluster Analysis Implementation

If you want to implement cluster analysis in your retail business, there are a few actions you can take. Presented in phases, within each phase are steps that can help you your cluster analysis efforts.

How to Implement Cluster Analysis

Phase 1: Prepare the data and develop your plan

    1.1 Determine data sources you have access to and would like to use
    1.2 What are the benefits you want to achieve by implementing this process into the            business’ category management plan
    1.3 What capabilities does the business have/lack to execute the clustering process           (access to data, access to technology etc.)

Phase 2: Analyse the data and determine the best method of clustering

    2.1 Select the clustering algorithm to be used
    2.2 Determine the clustering method by deciding which variables would be most                   effective to use for clustering (e.g. to maximise revenue, cluster-based on                       store/category sales).

Phase 3: Execute

    3.1 Run the cluster analysis and evaluate the results
    3.2 Group stores into clusters for each product category
    3.3 Update store layouts, create new planograms and update strategies
    3.4 Clustering should be a dynamic process where clusters are re-evaluated                         periodically (e.g. 3/6 months).


Retail Clustering Mistakes To Avoid

Retail Clustering Mistakes To AvoidUsing store-based instead of category-based clustering

Store-based clustering fails to consider the differing category performance and customer purchasing behaviour across the store’s categories.

Category-based clustering can help you to make strategic decisions based on consumer behaviour. You can use different strategies within the same store to ensure the performance of each category is optimised.

Ignoring category performance 

Category performance is an important indicator that you can use to cluster stores for the same product category. You can analyse category performance to identify patterns and trends across various store branches.

Once you have considered all the store-level factors, you will need to further analyse how each category performs to understand the shopping behaviour of your target market.

Failing to analyse data to conduct a clustering exercise

Many retailers have adopted a subjective approach to clustering where stores they think have similar customers are grouped together for a particular product category. However, cluster analysis conducted using factual sales and POS data will be the most accurate and effective for boosting operational efficiency and customer loyalty. 

Ignoring strategic alliances and sales channel partners 

Industry role players such as manufacturers and suppliers play an important role in the clustering process. They are able to provide market-related category expertise and cross-retailer insights which will help you to develop more targeted assortment and merchandising strategies.

Clusters are, therefore, based on the most current internal and external information from across the market.

Failing to prioritise categories to cluster

Cluster analysis and implementation can be a time-consuming process. You can use the Pareto principle to help you understand where to begin. This relates to the fact that 20% of your categories contribute to 80% of your income. Therefore, focusing on these categories will have a noticeable impact on sales

 Assuming clustering is a once-off exercise

Clustering is a dynamic exercise. Category performance, industry conditions, market trends and consumers are constantly changing. Therefore, it’s best practice to redo a clustering exercise every 3 to 6 months to ensure that the strategies used are still applicable and will produce optimised category performance. 

Clustering pillar page-banner


Cluster Analysis Interpretation

Cluster Analysis Interpretation

Clustering for CRM

You should focus on delivering a highly personalised shopping experience for customers.

Customer relationship management (CRM) uses information technology (IT) to acquire, maintain and grow customer segments of the target market. CRM also assists you in building relationships and boosting customer loyalty by implementing customer-centric strategies built on data analytics. You can collect information on each of your customers to analyse their purchase history and buying behaviour.

The practice of CRM focuses on customer identification, attraction, development and retention. You can identify customers using customer segmentation and data mining techniques such as clustering.

Customers are attracted and developed using predictive analysis that determines future consumer behaviour. You can retain customers because CRM is a customer-centric approach to category management where the business develops a deep understanding of their customers to target them effectively.

Clustering for Consumer Segmentation

You can use consumer segmentation to segment group a market into smaller groups according to the needs of consumer’s, defining characteristics and buying behaviour. You can also use this practice to collect information about your target market so that you can offer the right products at the right time, place and price.

You can segment your consumers according to various characteristics such as demographics, geographic location, psychographics or behavioural variables.

Here, clustering algorithms are worth using for efficient and effective consumer segmentation to produce groups that exhibit similar consumer behaviour. It can, therefore, be assumed that consumers who fall within the same cluster will respond in a similar manner to your strategies and tactics.

Clustering for Assortment Planning

A retail business’ survival and profitability depend on tools such as assortment planning in a time where convenience and personalisation are becoming more important.

Consumers are demanding that you create data-driven assortment plans to provide more targeted offerings. Product ranges are constantly changing. This highlights the need to move from single-assortment and store-specific assortment plans which are capital, time and human-resource intensive.

A clustered approach to assortment planning is a more scalable, predictive and resource-efficient method to manage this function.

Clustering for Inventory Management

Once you have optimised your assortment planning function, you can look at using cluster analysis to improve your inventory management

After sending the product range to your stores, you can use cluster analysis to predict and manage the stock turn of each product. This requires shelf space dimensions, product dimensions, weekly movement of each product and overall days of supply.

Stores within the same cluster are likely to require the same inventory management strategies due to the similar target market and their consumer behaviour for the particular category. Therefore, you can also predict and monitor stock movement to avoid stock-outs or overstocks, resulting in profit optimisation.

Clustering for Predictive Analysis

Cluster analysis uses the principle of association to identify existing relationships and sequences between data points.

This means that by conducting a cluster analysis, you can draw inferences and make predictions about your business using historical sales data. With this information, you can describe patterns in consumer behaviour over a period of time. Therefore, you can better understand and anticipate the consumer behaviour of your target market and obtain a competitive advantage.

Date Cubes banner - page-01


The Case For Using Cluster Analysis

The Case For Using Cluster Analysis

Benefits customer relationship management

Within the context of CRM, retailers who implement clustering in their business will enjoy decreased customer acquisition costs and improvements in customer understanding and service, resulting in increased customer satisfaction, retention and loyalty.

The profitability of lucrative market segments is increased as your business is able to identify and target them.

Benefits consumer segmentation

Retailers and suppliers who implement cluster-based consumer segmentation will experience optimised profit and customer satisfaction due to the improved understanding of their consumer behaviour.

Financial benefits such as higher ROI on marketing schemes, increased customer retention, increased shopper in-store spend and overall basket share may also be experienced.

Benefits the assortment planning function

With the increased understanding of the target market’s needs and wants through the use of cluster analysis, your buyers can tailor the product assortment to the consumer and achieve a competitive advantage. The right products are, therefore, offered at the right place, price and using the right promotion techniques to result in benefits for both the retailer and shopper

Benefits the micro and macro space planning function

Using internal and external market data that has been analysed and grouped using cluster analysis, your buyers can assess their product range in more detail.

You can remove poorly performing products from the range to provide more shelf space for profitable products. Overall, you can re-organise shelf and floor space for optimised space allocation and category profitability.

Benefits the marketing function

Using cluster analysis within the marketing function allows you to reduce your advertising costs as you can target your adverts and implement specific strategies.

You can also use clustering to generate customer profiles which you can analyse and target effectively for a profitable response.

Improves the accuracy of demand forecast data

The accuracy of data of an entire cluster is more accurate than data for a single store. Therefore, with cluster analysis, you can identify existing relationships and correspondences among variables over a period of time, thus improving the accuracy of demand forecast data.

Using cluster mapping, cluster analysis is good for presenting complex data groups and patterns in an easily understandable manner.