paytm_mall_epurchase_data data analysis.

ankita222345 11 views 26 slides Jun 05, 2024
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

Completed an internship given by psyliq. where I need to find out the insights for paytm epurchase. in which i used Mysql workbench as well as Excel to showcase my project insights.


Slide Content

Paytm mall epurchase data analysis Paytm Mall E-purchase Data Analysis

The "Category_Grouped" column likely represents a grouped or categorized version of the original category column, possibly for the purpose of simplifying analysis or reporting. There are 5 Unique categories that are present in the column, listed as: Others , Apparels , Empty values, S hoes , H ome.

Top 5 Shipping Cities in terms of orders are: New Delhi, Chennai, Bangalore, Mumbai, Hyderabad

In our dataset, the " Sale_Flag " column contains two distinct values: "ON SALE" and "NOT ON SALE". To clarify, in the context of our analysis, "ON SALE" represents products that are currently discounted or on sale. Therefore, when referring to a " Sale_Flag " of 'Yes', it indicates products marked as "ON SALE".

The Most Expensive item is element soul M Black Running Shoes of Brand ADDIDAS, purchased by RAJEEV SHAHI from Kolkata when it was NOT ON SALE . Shown on next page.

"Due to the nature of my dataset, which primarily consists of cities in India such as Jabalpur, Ahmedabad, Chennai, and others, the currency used is Indian Rupees (INR). As there were no prices denoted in $ or other foreign currencies in the ' Special_Price_effective ' column, I applied conditional formatting to highlight products with a value below ₹3750 (equivalent to $50) in red, ensuring consistency and relevance to the dataset's context .“ There are no such columns which are below $50 when we apply conditional formatting to highlight all products with a “ Special_Price_Effective ” value below $50.

Conditional formatting using SQL query: This SQL query selects all columns from the table PAYTM_EPURCHASE_DATA and adds an additional column named "Formatting". In this new column, a conditional expression is applied using the CASE statement.

Conditional formatting using Excel:

In the absence of a dedicated "Sales" column in the dataset, the analysis for total sales value for each category was conducted using available metrics such as " Item_Price ." While " Item_Price " typically reflects the price at which items are sold to customers, it was utilized as a proxy for sales revenue. Understanding that " Cost_Price " and " Paid_pr " represent costs incurred by the seller, the focus was on deriving revenue estimates using " Item_Price " to gain insights into category-wise sales performance. Leveraging Excel's pivot table functionality, the dataset was effectively summarized, allowing for informed decision-making despite the absence of a specific sales column.

In absence of a direct "Clothing" category in the dataset, the analysis was performed using the "Category_Grouped" column, where "Clothing" items were represented as "Apparels." Utilizing this proxy, the average quantity sold for products in the "Clothing" category was calculated, grouped by " Product_Gender ." This approach allowed for meaningful insights into sales trends despite the categorical variation, ensuring accurate analysis and decision-making. Average Quantity sold for products in “Clothing” category, grouped by “ Product_Gender ” are 1 for each MEN and WOMEN Product_Gender .

The DISTINCT keyword ensures that only unique items are returned, as the same item may have multiple entries in the dataset. By ordering the results based on the "CM_RATIO" in descending order and limiting the output to the top 5 entries, the query identifies the items with the highest CM ratio. This approach allows for the identification of products with the most favorable cost-to-margin ratio, aiding in strategic decision-making and product analysis.

Top 3 Brands are NIKE with TOTAL SALES OF 43406505 , SANGRIA TOTAL SALES OF 22614700 , PUMA TOTAL SALES OF 20832027 .

Total Revenue Generated from “ELECTRONICS” category products with a “ Sale_Flag ” of “YES” is WATCHES TOTAL REVENUE OF 42502840 .

Same Q uery in two different ways:

While working on this query, I faced some challenges because it was quite complicated. However , I managed to find helpful information online and learned about common table expressions, especially the WITH clause. After spending some time studying and practicing, I was able to figure out how to calculate the percentage contribution of each category to the total sales. This experience helped me improve my problem-solving skills and taught me more about optimizing SQL queries.

“ Category” with the highest average “ Item_Price ” is Furniture and its Average_Price is 7560.6011

We use the MOD() function to calculate the remainder of the division of the order number ( S_no ) by 12. This effectively assigns each order to a month based on its order number . The FLOOR() function rounds down the result of the division to the nearest integer . We add 1 to ensure that the months are numbered from 1 to 12.The results are grouped by the calculated month . We order the results by the total sales revenue in descending order using ORDER BY . Finally , we limit the result to only the first row using LIMIT 1, which gives us the month with the highest total sales revenue August Month is the Month with the highest total sales Revenue of 24882263

When faced with a dataset lacking explicit date columns, determining the month with the highest total sales revenue can be challenging . However, one approach is to use a numerical field such as transaction numbers (S_NO) as a proxy for months. By dividing the transaction numbers by 12 (since there are typically 12 months in a year) and taking the remainder, each transaction is assigned to a hypothetical month. Then , summing up the item prices for each month allows us to identify the month with the highest total sales revenue. This method provides an estimation of temporal trends, although it assumes a consistent transaction frequency throughout the dataset. While it may not provide exact results, it offers insights into sales patterns over time . Please note that this approach assumes a uniform distribution of orders across months, which may not be accurate. If possible, it's always best to have a date column in your dataset to perform time-based analyses accurately.
Tags