working with basic statistical function.

anjanasharma77573 14 views 9 slides Jun 28, 2024
Slide 1
Slide 1 of 9
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9

About This Presentation

working with basic statistical function


Slide Content

1 . D e s c r i p t i v e S t a t is ti c s It is used to describe the basic features of data that provide a summary of the given data set which can either represent the entire population or a sample of the population. It is derived from calculations that include: Mean: It is the central value which is commonly known as arithmetic average. Mode: It refers to the value that appears most often in a data set. Median: It is the middle value of the ordered set that divides it in exactly half

Variability Range : The difference between the highest and lowest value in the dataset. Percentiles, Quartiles and Interquartile Range (IQR) Percentiles — A measure that indicates the value below which a given percentage of observations in a group of observations falls. Quantiles — Values that divide the number of data points into four more or less equal parts, or quarters. Interquartile Range (IQR) — A measure of statistical dispersion and variability b a s e d o n d i v i d i n g a d a t a s e t i n t o q u a r t il e s . I Q R = Q3 − Q1 Percentiles, Ǫuartiles and Interquartile Range (IǪR)

Variance : The average squared difference of the values from the mean to measure how spread out a set of data is relative to mean. Standard Deviation : The standard difference between each data point and the mean and the square root of variance.

Correlation It is one of the major statistical techniques that measure the relationship between two variables. The correlation coefficient indicates the strength of the linear relationship between two variables. A correlation coefficient that is more than zero indicates a positive relationship. A correlation coefficient that is less than zero indicates a negative relationship. Correlation coefficient zero indicates that there is no relationship between the two variables .

P r o b a b i l i t y D i s tr i b u t i o n It specifies the likelihood of all possible events. In simple terms, an event refers to the result of an experiment like tossing a coin. Events are of two types dependent and independent. Independent event: The event is said to be an Independent event when it is not affected by the earlier events. For example, tossing a coin, let us consider a coin is tossed the first outcome is head when the coin is tossed again the outcome may be head or tail. But this is entirely independent of the first trial. Dependent event: The event is said to be dependent when the occurrence of the event is dependent on the earlier events. For example when a ball is drawn from a bag that contains red and blue balls. If the first ball drawn is red, then the second ball may be red or blue; this depends on the first trial. The probability of independent events is calculated by simply multiplying the probability of each event and for a dependent event is calculated by conditional probability.

Regression It is a method that is used to determine the relationship between one or more independent variables and a dependent variable. Regression is mainly of two types: Linear regression: It is used to fit the regression model that explains the relationship between a numeric predictor variable and one or more predictor variables. Logistic regression: It is used to fit a regression model that explains the relationship between the binary response variable and one or more predictor variables.

N o r m a l D is t r i b u t i o n Normal is used to define the probability density function for a continuous r a n d o m v ar i a bl e i n a s y st e m . The standard normal distribution has two parameters – mean and standard deviation that are discussed above. When the distribution of random variables is unknown, the normal distribution is used. The central limit theorem justifies why normal d i st r i b u t i o n i s u s e d i n s u c h c a s e s .

Bias In statistical terms, it means when a model is representative of a complete population. This needs to be minimized to get the desired outcome. T h e t h r e e m o st c o m m o n t y pe s o f b i a s a r e : Selection bias: It is a phenomenon of selecting a group of data for statistical analysis, the selection in such a way that data is not randomized resulting in t h e d a t a b e i n g u n r ep r e s e n t a t i v e o f t h e w h o l e p o p u l a t i o n . Confirmation bias: It occurs when the person performing the statistical a n a l y s i s h a s s o m e p r e d e f i n e d a s s u m p t i o n. Time interval bias: It is caused intentionally by specifying a certain time r a n g e t o f a v o r a p a r t i c u l a r o u t c o m e .
Tags