DAR L66666666666666666666666666666yyyEC9.pptx

ravinalohat100 7 views 15 slides Oct 03, 2024
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

Yyyyy


Slide Content

Descriptive Statistics Compiled and Presented by: Dr.Chetna Arora

Sample numeric data data <- c(10, 20, 20, 25, 30, 30, 35, 40, 45, 50, 50, 60) Sample categorical data categories <- c("A", "B", "A", "C", "B", "C", "A", "B", "C", "A", "B", "C")

Calculating Descriptive Statistics in R 1) Mean: The average of the numbers. mean(data) 2) Median: The middle value. median(data) 3) Mode : The most frequent value (not directly available in R, so we create a custom function). mode_value <- as.numeric (names(which.max(table(data)))) mode_value

4) Range: The difference between the maximum and minimum values. range(data) 5) Standard Deviation: Tells us how spread out the data is. sd(data) 6) Frequency: The count of each unique value. table(data)

Creating Visualizations in R 1. Histogram A histogram shows the distribution of a numeric variable. # Create a histogram hist (data, main="Histogram of Data", xlab ="Values", col =" lightblue ", breaks=5) Explanation: The histogram groups data into ranges (bins) and shows how many data points fall into each range. 2. Bar Chart A bar chart is used for categorical data. # Create a bar chart for categories barplot (table(categories), main="Bar Chart of Categories", col =" lightgreen ", ylab ="Frequency”) Explanation: This bar chart shows the frequency of each category.

Case Study: Analyzing Sales Data of a Supermarket Chain Background: A supermarket chain wants to understand its sales performance to make better business decisions. The dataset contains information on total sales, product categories, regions, and the number of stores. The company wants to analyze key performance indicators (KPIs) such as average sales, variability in sales across different regions, and distribution of product categories.

Objectives: Calculate the average sales across all stores. Determine the variability (standard deviation) in sales. Analyze the distribution of sales by product category. Understand regional sales performance using summary statistics.

Dataset Structure: Region: Region of the store (North, South, East, West) Store: Store ID Product_Category: Category of products sold (e.g., Beverages, Snacks, Produce, Dairy) Sales: Total sales in dollars for the period

Region Store Product-Category Sales North 101 BEVERAGES 5000 South 102 DAIRY 4200 East 103 SNACKS 3200 West 104 PRODUCTS 2800 North 105 BEVERAGES 4800

Loading the dataset: sales_data <- data.frame (   Region = c("North", "South", "East", "West", "North"),   Store = c(101, 102, 103, 104, 105),   Product_Category = c("Beverages", "Dairy", "Snacks", "Produce", "Beverages"),   Sales = c(5000, 4200, 3200, 2800, 4800) )

View the structure of the dataset: str(sales_data)

Mean Sales: mean_sales <- mean(sales_data$Sales) mean_sales

Standard Deviation of Sales: sd_sales <- sd(sales_data$Sales) sd_sales

Summary of Sales Data: summary( sales_data$Sales ) Sales by Region: sales_by_region <- aggregate(Sales ~ Region, data = sales_data , FUN = mean) sales_by_region

Sales by Product Category: sales_by_category <- aggregate(Sales ~ Product_Category, data = sales_data, FUN = mean) sales_by_category
Tags