Product Cluster Analysis: Unveiling Hidden Customer Preferences

jadavvineet73 274 views 33 slides Jun 26, 2024
Slide 1
Slide 1 of 33
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33

About This Presentation

This presentation explores product cluster analysis, a data science technique used to group similar products based on customer behavior. It delves into a project undertaken at the Boston Institute, where we analyzed real-world data to identify customer segments with distinct product preferences. for...


Slide Content

Product Cluster Analysis Presented by : AKSHITHA RAI Leveraging Data for Competitive Edge

PROJECT DOMAIN Retail and Wholesale Distribution Overview of the Industry: The retail and wholesale distribution sector plays a pivotal role in the economy, connecting manufacturers with consumers, encompassing businesses that sell and distribute goods directly to end-users. Challenges and Opportunities: B usinesses face diverse challenges including inventory management, including supply chain complexities, changing consumer preferences, and market competition. However, this project is specifically geared towards addressing inventory management challenges, such as stockouts and overstocking.

What is Inventory Management? There are three types of inventory: Raw Materials are stock used to make an end product. Work in Process consists of the raw materials that are being made into finished goods. Finished Goods are the final products that get produced for sale to consumers. Inventory management tracks how much physical inventory you have in your organization. It monitors stock at other locations, such as distributors or subcontractors. When you have clear visibility into your inventory, you know when to order, where to store it, and when you need to stop selling.

PROJECT OVERVIEW Primary Objective Benefits Group similar products based on sales patterns to optimize inventory management and tailor offerings to better align with customer preferences. Streamlined Inventory Management : Clustering similar products for optimal stock levels. By understanding the specific needs of each product cluster. Tailored Product Offerings : Develop targeted promotional strategies and dynamic pricing models for each product category. Reduced Costs : M inimizing stock-outs or overstocking can lead to significant cost savings in warehousing, transportation, and handling .

DATA DEFINITION Warehouse and Retail Sales dataset: Data Columns/Features given: YEAR: Calendar Year MONTH: Month SUPPLIER: Supplier Name ITEM CODE: Item code ITEM DESCRIPTION: Item Description ITEM TYPE: Item Type RETAIL SALES: Cases of product sold from DLC dispensaries RETAIL TRANSFERS: Cases of product transferred to DLC dispensaries WAREHOUSE SALES: Cases of product sold to MC licensees Total Entries: 307,645 Columns: 9 “Cases" represent a standard packaging unit, such as a case of 12 bottles or a case of 24 units, depending on the specific products and packaging used by the DLC and its licensees Alcohol Beverage Services, previously known as the Department of Liquor Control (DLC) is a government agency within the County of Montgomery, Maryland and is the wholesaler of beer, wine and alcoholic beverages.

01 - Data Cleaning 02 - Data Exploration Methodology Overview 04 - Feature Scaling 06 - Evaluate the cluster 07 - 07- Clustering Analysis 03 - Data Pre-processing 05 - Apply ML Algorithms

Data Cleaning Summary Missing Values: There are missing values in the " SUPPLIER " column (167 missing values), which account for only 0.0543% of the values in the "Supplier" column, hence can be dropped. Rows with missing values in the " ITEM TYPE " column (1 value) will also be dropped. Duplicates: Total duplicates found: . After Data Cleaning: After handling missing values and removing duplicates, the " df_clean " DataFrame contains 307,477 entries with 9 columns.

Data Exploration 9 ITEM TYPES 396 SUPPLIERS 34039 TOTAL UNIQUE PRODUCTS Unique Item Types and Frequencies: WINE: 187,640 LIQUOR: 64,910 BEER: 42,413 KEGS: 10,146 NON-ALCOHOL: 1,899 STR_SUPPLIES: 318 REF: 79 DUNNAGE: 72 The dataset contains a variety of item types, with wine being the most prevalent, followed by liquor and beer . Each unique item code corresponds to a distinct product, showcasing the wide variety of products available in the dataset. The dataset comprises a diverse range of suppliers, reflecting a broad network of sources for the products included. Among all the item types we have, the  'REF' category  remains somewhat ambiguous in its definition. Let's delve deeper into its contents to shed light on the nature of items it encompasses .

Data Exploration The  "Store Special Wine," "Store Special Beer Quart," and "Store Special Liquor"  items within the REF category likely represent specific promotional or discounted offerings for these beverage types. Moving these items from the REF category to their respective broader categories (WINE, LIQUOR, and BEER) could provide a clearer representation of sales within each beverage type. "Corkscrew" and "Wine Aerator-in Bottle"  are accessories or tools related to wine consumption rather than actual alcoholic beverages. So let's  rename 'REF' to "Wine Tools" .

Data Exploration Wine emerges as the most prevalent item type in the dataset, with 187,679 occurrences, indicating its significant presence in sales data. Liquor follows closely behind wine, with 64,911 occurrences, suggesting its substantial contribution to overall sales volume and consumer preference. Beer ranks third in frequency, with 42,422 occurrences, highlighting a notable consumer preference for beer products.

Data Exploration The  top suppliers predominantly belong to the beer industry , including renowned brands such as  Miller Brewing Company, Anheuser Busch Inc, and Heineken USA . This suggests that beer products play a significant role in driving sales for the business. There is also representation from other beverage categories , such as wine and spirits . E & J Gallo Winery and Diageo North America Inc are notable suppliers of wine and spirits, contributing to the diversity of product offerings.

Data Exploration The correlation coefficient between retail sales and warehouse sales is approximately 0.501. This indicates a moderate positive correlation between the two variables but the relationship is not extremely strong.

Data Exploration-RETAIL SALES Liquor products (whiskey, brandy, vodka, rum, gin and tequila) demonstrate the highest retail sales, suggesting a considerable demand, potentially driven by factors like taste preferences or social trends. Wine and beer closely trail liquor in retail sales, indicating their enduring popularity and consumer appeal within the market. Non-alcoholic beverages make a modest contribution to retail sales, reflecting a segment of the market that caters to consumers seeking alternatives to alcoholic beverages, perhaps driven by health-conscious choices or personal preferences. "RETAIL SALES" typically refer to the cases of products sold directly to customers from DLC (Department of Liquor Control) dispensaries or retail outlets.

Data Exploration- RETAIL TRANSFER The analysis highlights the varying degrees of demand across different beverage categories, with alcoholic beverages such as liquor, wine, and beer leading in terms of both retail transfers and sales. Non-alcoholic beverages , although contributing less to total retail transfers and sales compared to alcoholic beverages, still show notable demand.  Let's examine the items under STR_SUPPLIES category to understand why there is a surge in retail transfers compared to retail sales for STR supplies .

Data Exploration These observations suggest a diverse range of products being transferred, with a notable emphasis on  packaging materials  like paper bags and thermal register paper. Paper bags in various sizes (12LB, 20LB, quarts, pints, 1/6 barrel) are the top-selling items in terms of retail transfers, indicating a significant demand for packaging materials. Using paper bags with branded logos or designs can contribute to a store's branding and image. Single bottle wine gift totes are also popular, suggesting a preference for gifting wine. Thermal register paper is also among the top-selling items, it plays a crucial role in retail operations by facilitating efficient and reliable receipt printing at the point of sale. Other items like shot glasses, plastic bags, wine tumblers, and plastic wine glass packs are also being transferred.

Data Exploration-Warehouse SALES Beer emerges as the top-performing item type in terms of warehouse sales. This dominance may be attributed to several factors, including its popularity among consumers, diverse product offerings, and its relatively longer shelf life compared to some other alcoholic beverages. Despite being second to beer, wine still makes a substantial contribution to warehouse sales.  KEGS demonstrate considerable demand and play a crucial role in the distribution of draft beer to bars, restaurants, and other establishments. "WAREHOUSE SALES" refer to cases of products sold to MC (Montgomery County) licensees, which are establishments authorized to sell alcohol for consumption on their premises. These sales are typically made in bulk to businesses such as bars, restaurants, clubs, and hotels for resale to their customers.

Data Exploration These item descriptions for 'DUNNAGE' suggest that they represent various sizes of empty kegs used for storing and dispensing beverages. In inventory management, these items would be categorized as dunnage , which refers to materials used for packaging, storing, or transporting goods to prevent damage. The negative values in the "WAREHOUSE SALES" column for dunnage items indicate a decrease in inventory, likely due to returns, exchanges, or damage to the empty kegs.  Since dunnage items have zero retail sales and retail transfers, they primarily represent inventory management for kegs, it may be appropriate to remove them from the ITEM TYPES before applying clustering algorithms , as they do not contribute to the primary objectives of analyzing sales patterns and optimizing inventory management.

Data Preprocessing Let's drop the ITEM DESCRIPTION, SUPPLIER, and ITEM CODE columns as they are categorical and may not directly contribute to the clustering process. Let's also drop the 'YEAR' and 'MONTH' columns as the given data is not evenly distributed across 2017, 2018, 2019, and 2020. Uneven distribution of data across years and months can introduce noise and bias into the clustering process, potentially leading to misleading insights. To prepare the data for clustering analysis, we performed dummy encoding on the categorical 'ITEM TYPE' column using the pd.get_dummies() function. The dummy encoded columns allow the inclusion of categorical information (item types) in the numerical data, ensuring that this characteristic is considered during clustering.

Feature Scaling Since clustering algorithms are sensitive to the scale of the features, it's essential to scale the numerical features before applying clustering algorithms. We applied MinMax scaling to normalize the numerical features, including 'RETAIL SALES', 'RETAIL TRANSFERS', and 'WAREHOUSE SALES' , ensuring that these variables are on a consistent scale between 0 and 1. This scaling technique preserves the relative relationships between the features and prepares the data for clustering analysis, enhancing the accuracy of our results and facilitating meaningful comparisons across different sales metrics.

CLUSTERING: K-Means Algorithm K-means is a centroid-based algorithm or a distance-based algorithm, where we calculate the distances to assign a point to a cluster. In K-Means, each cluster is associated with a centroid. The main objective of the K-Means algorithm is to minimize the sum of distances between the points and their respective cluster centroid. To perform K-means clustering, we must first specify the desired number of clusters K using : Elbow Method and Silhouette Method Using th e elbow plot, we obtained K value = 4 . Let's verify using the Silhouette score.

CLUSTERING: K-Means Algorithm The Silhouette score   is calculated for each data point and then averaged across all data points. It measures how similar an object is to its own cluster (cohesion) compared to other clusters (separation). It ranges from -1 to 1 . A silhouette score close to 1 indicates dense, well-separated clusters, while a score close to -1 indicates that the data point may have been assigned to the wrong cluster. Peak at 6 clusters : The highest silhouette score is at 6 clusters (0.9926370991740481), indicating that this configuration is the best in terms of clustering quality according to the silhouette score. The silhouette score slightly decreases when moving to 7 clusters (0.9882040308746644), suggesting that adding the seventh cluster doesn't improve the clustering. Let’s choose K value=6 for clustering.

CLUSTERING: K-Means Algorithm Clustering Evaluation Metrics: After building the K means model and assigning the cluster labels, along with the silhouette score we also use Davies-Bouldin Index and Calinski-Harabasz Index to assess the clustering results. Lower DBI values indicate better clustering quality as they suggest that clusters are compact and well-separated. Higher values of the Calinski-Harabasz index indicate better-defined, more compact clusters. Both metrics strongly suggest that the K-means clustering on the dataset is highly effective, resulting in distinct and cohesive clusters . This indicates that the choice of the number of clusters and the algorithm’s performance on this dataset are very good.

CLUSTERING: Mini Batch K-Means Mini Batch K-Means clustering is a variation of the traditional K-Means algorithm that processes small batches of data at a time, making it computationally efficient for large datasets. The results suggest that for the given dataset, the standard K-Means algorithm was able to identify more well-defined, distinct, and compact clusters compared to Mini Batch K-Means , as evidenced by the higher Silhouette Score, lower Davies-Bouldin Index, and higher Calinski-Harabasz Index.

CLUSTERING: Mean Shift Algorithm Mean-shift clustering is a density-based clustering method that focuses on finding the regions of high density and iteratively shifting data points towards the highest density of points. The algorithm does not require any prior information about the number of clusters present in the data. Mean Shift splits the LIQUOR and BEER clusters into smaller sub-clusters, which might be less desirable considering our objective to group similar items. Silhouette Score and Davies-Bouldin Index favor K-Means, suggesting it creates more well-defined and better-separated clusters. Calinski-Harabasz Index slightly favors Mean Shift. Overall, K-Means appears to be the better clustering method based on these metrics and the cluster compositions .

Cluster Analysis Based on the evaluation metrics, K-means demonstrated the best performance among the clustering algorithms tested. Therefore, we will conduct our cluster analysis using the results from the K-means algorithm. Cluster 0 (WINE):  Cluster 0 primarily consists of wine products. This cluster likely represents customers with a preference for wine products. Cluster 1 (LIQUOR):  Cluster 1 is characterized by liquor products. This cluster may indicate a distinct group of customers who prefer liquor items. Cluster 2 (BEER):  Cluster 2 primarily includes beer products. This cluster likely represents customers who have a preference for beer. Cluster 3 (KEGS):  Cluster 3 is associated with kegs. Kegs are designed for larger quantities of beverages and are typically purchased by commercial establishments like bars, restaurants, and event organizers, rather than individual customers directly.

Cluster Analysis Cluster 4 (NON-ALCOHOL):  Cluster 4 comprises non-alcoholic products. This cluster may indicate customers who prefer non-alcoholic beverages to avoid alcohol for health, religious, or personal reasons. This niche market may have a smaller customer base compared to the overall population. Cluster 5 (STR_SUPPLIES, WINE TOOLS)  The STR_SUPPLIES category includes a variety of storage supplies, such as paper bags, thermal register paper, and plastic bags, used for packaging and transporting various products. STR_SUPPLIES category is more likely to be used by retailers or establishments rather than individual customers, given the nature of the products and their usage in packaging and transporting various goods. The presence of wine tools alongside storage supplies in Cluster 5 also indicate that these purchases are also made by commercial entities. Businesses often require wine tools for serving and preparing wine for their customers.

Cluster Analysis Beer has the highest total sales across retail, transfers, and warehouse at 7101470.19 units. This suggests beer is the top selling product category. Wine has second highest total sales at 1903693.02 units, with a significant portion (1,156,984.91 units) coming from warehouse sales. This implies wine is a popular product category for wholesale distribution. Liquor has the third highest total sales at 897599.52 units, but the majority of these sales (802,693.25 units) are from retail stores rather than warehouses. This indicates liquor is more popular in retail locations. Kegs have 118,430 units in warehouse sales but 0 retail sales, indicating they are only sold wholesale and not directly to consumers Non-alcoholic beverages have relatively low total sales at 53299.9 units, suggesting they are a minor product category compared to beer, liquor and wine.

Cluster Analysis  Clusters where the total sales are among the top 25%, include cluster 0 [WINE] and cluster 2 [BEER] indicating high demand. This insight is valuable for inventory management, marketing strategies, and product planning, as businesses can focus on stocking and promoting beer and wine products to capitalize on their popularity among consumers. Let’s analyze seasonal decomposition plots for RETAIL SALES, RETAIL TRANSFERS, and WAREHOUSE SALES to understand seasonal effects and overall trends in each cluster. These insights will help us grasp the nuances driving our sales patterns, aiding better decision-making and strategy.

Cluster Analysis Cluster 0 (WINE): Overall Trend: Stable retail sales and transfers; slight upward trend in warehouse sales. Seasonality: Peak sales in November and December , lower sales in January and February. Holiday shopping drives increased demand. Cluster 1 (LIQUOR): Overall Trend: Upward trends in retail, transfers, and warehouse sales. Seasonality: Fluctuations throughout the year; significant peaks in November and December during holiday seasons, indicating higher demand. Cluster 2 (BEER): Overall Trend: Mixed trends in retail, transfers, and warehouse sales. Seasonality: Peaks during specific months, possibly tied to seasonal events or consumer behaviors. CLUSTER 3 (KEGS): Overall Trend : The overall trend suggests a focus on warehouse distribution for KEGS within this cluster, highlighting a specific market approach for these products. Seasonality Warehouse sales for KEGS show fluctuations but generally remain at moderate levels.

Power BI Dashboard

Recommendations For Strategy Optimization Focus on Beer and Wine Products: Beer and wine products consistently show high demand across retail, transfers, and warehouse sales. Allocate more resources towards stocking and promoting these items to capitalize on their popularity among consumers. Seasonal Planning: Adjust inventory levels based on seasonal trends observed in each cluster. For example, increase stock of wine and liquor products leading up to the holiday season when demand peaks, while optimizing inventory for non-alcoholic beverages during warmer months or alongside health-conscious trends. Retail Strategy Alignment: Tailor retail strategies to align with consumer behavior and preferences observed in each cluster. For instance, offer promotions or themed events around wine and liquor during peak demand periods, while emphasizing convenience and variety for non-alcoholic beverages. Wholesale Distribution Optimization: Optimize warehouse distribution channels based on product categories. Since kegs predominantly sell through wholesale channels, streamline distribution processes to ensure efficient supply to commercial establishments while monitoring trends to anticipate demand fluctuations. Diversification: Consider diversifying product offerings within clusters or specializing in certain categories based on market demand and profitability. For instance, within the wine cluster, explore niche or premium wine selections to cater to specific consumer preferences and enhance profitability.

Recommendations For Strategy Optimization Data-Driven Decision Making: Continuously analyze sales data and consumer trends to make informed inventory management decisions. Utilize advanced analytics tools to forecast demand, identify emerging trends, and optimize inventory levels to minimize stockouts and overstock situations. Enhanced Marketing and Promotions: Develop targeted marketing campaigns and promotions to drive sales for specific product categories within each cluster. Leverage consumer insights to tailor messaging and offers that resonate with target audiences, increasing engagement and conversion rates. Supplier collaboration : Work closely with suppliers to establish flexible and responsive supply chains. This can include sharing demand forecasts and collaborating on inventory planning to ensure timely delivery without overstocking. Regular inventory audits : Conduct regular audits to identify slow-moving or obsolete inventory and take proactive measures to liquidate or reduce it. This prevents tying up capital in products that are unlikely to be sold. Efficient storage and warehouse management : Optimize warehouse layout and storage practices to maximize space utilization and minimize holding costs.

CONCLUSION Benefits Recap of project objective: Throughout this project, our primary objective was to leverage data analysis to optimize inventory management processes and tailor our product offerings to better align with customer preferences. Achievements : Identified high-demand product categories (beer and wine). Determined seasonal trends and their impact on sales. Developed actionable insights for inventory management and retail strategies.