Section 6 - Chapter 2 - Introduction to Statistics Part II
ptaimp
83 views
54 slides
Mar 11, 2025
Slide 1 of 54
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
About This Presentation
Section 6 - Chapter 2 - Introduction to Statistics Part II - Presented by Rohan Sharma - The CMT Coach - Chartered Market Technician CMT Level 1 Study Material - CMT Level 1 Chapter Wise Short Notes - CMT Level 1 Course Content - CMT Level 1 2025 Exam Syllabus Visit Site : www.learn.ptaindia.com and...
Section 6 - Chapter 2 - Introduction to Statistics Part II - Presented by Rohan Sharma - The CMT Coach - Chartered Market Technician CMT Level 1 Study Material - CMT Level 1 Chapter Wise Short Notes - CMT Level 1 Course Content - CMT Level 1 2025 Exam Syllabus Visit Site : www.learn.ptaindia.com and www.ptaindia.com
Size: 6.11 MB
Language: en
Added: Mar 11, 2025
Slides: 54 pages
Slide Content
Chapter 2 - Introduction to Statistics Part II Section 6 – Statistics Analysis Presented By : This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Agenda Introduction to Statistics Part 2 Data Visualization Correlation Linear Regression Putting It All Together Microsoft Excel Functions Used This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Anscombe’s Quartet Presented By : This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Anscombe’s Quartet Key Facts: 1. Definition: Anscombe’s Quartet consists of four datasets that have nearly identical statistical properties but display distinct distributions when graphed. 2. Creator: Francis Anscombe (1973). 3. Purpose: Demonstrates the importance of data visualization in statistical analysis. 4. Statistical Properties (for all four datasets): o Mean of XXX: 9 o Mean of YYY: 7.5 o Variance of XXX: 11 o Variance of YYY: 4.12 o Correlation coefficient ( rrr ): 0.816 o Linear regression equation: y=3+0.5xy = 3 + 0.5xy=3+0.5x
Anscombe’s Quartet This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Anscombe’s Quartet Interpretation of Each Dataset: 1. Dataset I: o Appears to follow a standard linear relationship. o The regression model accurately fits the data. 2. Dataset II: o Clearly follows a non-linear (quadratic) relationship. o A straight-line regression is not appropriate. 3. Dataset III: o Mostly linear but includes an outlier. o The outlier distorts the regression results. 4. Dataset IV: o Strong vertical outlier that drives correlation. o The regression line is misleading due to a single influential point.
Anscombe’s Quartet Comparison of the Four Datasets Feature Dataset I Dataset II Dataset III Dataset IV Linear Fit Good Poor (curved) Affected by an outlier Distorted by an extreme point Outliers None None One influential point One extreme outlier Suitable for Linear Regression? Yes No Somewhat No Visualization Needed? Less critical Very important Very important Extremely important This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Anscombe’s Quartet Cheat Sheet : Key Takeaways: • Summary statistics alone can be misleading. • Always visualize your data before making conclusions. • Outliers and non-linearity significantly impact regression results. • Anscombe’s Quartet warns against blind reliance on summary statistics and automated analyses . Dataset Visual Pattern Key Insight I Linear, with minimal scatter Well-behaved dataset, regression valid II Non-linear (quadratic) Linear regression is misleading III Has an outlier One point strongly affects regression IV Vertical outlier One point distorts correlation This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Histogram in Data Visualization Presented By : This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Histogram in Data Visualization Key Facts: 1. Definition: A histogram is a graphical representation of the distribution of numerical data using bars. 2. Purpose: Helps visualize the frequency of data points within specified intervals (bins). 3. X-axis: Represents the range of values (bins). 4. Y-axis: Represents the frequency (count) of occurrences in each bin. 5. Key Features: o Shows the shape of the distribution (e.g., normal, skewed, bimodal). o Helps identify outliers, trends, and data spread. o Useful for understanding distributions before applying statistical models . This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Histogram in Data Visualization
Histogram in Data Visualization Cheat Sheet: Concept Description Bins Intervals that group data points Skewness Asymmetry in data distribution (left/right skewed) Kurtosis Measures how heavy/light the tails of a distribution are Uniform Distribution Bars of roughly equal height Normal Distribution Bell-shaped, symmetric around the mean Bimodal Distribution Two peaks, indicating two different groups in the data Right-Skewed Tail extends to the right (high values less frequent) Left-Skewed Tail extends to the left (low values less frequent) This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Histogram in Data Visualization Interpretation of Histograms: 1. Symmetric (Normal Distribution) o The highest frequency is in the center. o Data is evenly distributed around the mean. 2. Right-Skewed Distribution (Positively Skewed) o The tail extends to the right (higher values are rare). o Example: Income distribution in a population. 3. Left-Skewed Distribution (Negatively Skewed) o The tail extends to the left (lower values are rare). o Example: Age of retirement (most retire later, few retire very young ). This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Histogram in Data Visualization Interpretation of Histograms: 4 . Bimodal Distribution o Two peaks indicate two dominant groups. o Example: Exam scores of two distinct student groups. 5. Uniform Distribution o All bins have similar heights, meaning data is evenly spread. o Example: Rolling a fair die multiple times. This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Histogram in Data Visualization Comparison of Histograms vs. Other Charts: Feature Histogram Bar Chart Box Plot Density Plot Data Type Numerical Categorical Numerical Numerical X-Axis Ranges/Bins Categories Single Variable Continuous Values Shows Distribution? Yes No Yes Yes Shows Outliers? Limited No Yes Yes Best For? Frequency Distribution Comparing Categories Data Spread & Outliers Smoother Data Distribution Key Takeaways: ✅ Histograms help understand the shape, spread, and patterns of data. ✅ Choice of bin size affects histogram appearance (too few = oversimplified, too many = noisy). ✅ Great for checking normality, skewness, and trends before deeper statistical analysis. ✅ Not ideal for categorical data—use a bar chart instead. This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Box-and-Whisker Plot (Boxplot) Presented By : This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Box-and-Whisker Plot (Boxplot ) Key Facts: 1. Definition: A boxplot is a graphical summary of data distribution showing median, quartiles, and potential outliers. 2. Purpose: Helps visualize spread, central tendency, and variability of numerical data. 3. Components: o Box: Represents the interquartile range (IQR: Q1 to Q3). o Whiskers: Extend from the box to show variability outside the quartiles. o Median Line: Inside the box, marks the middle value. o Outliers: Individual points beyond whiskers (potential anomalies). 4. Used For: o Identifying skewness and spread of data. o Comparing multiple distributions. o Spotting outliers. This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Box-and-Whisker Plot (Boxplot ) This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Box-and-Whisker Plot (Boxplot ) Cheat Sheet: Term Description Minimum (Lower Whisker) Smallest data point within 1.5 × IQR from Q1 Q1 (First Quartile, 25%) 25% of data falls below this value Median (Q2, 50%) Middle value of the dataset Q3 (Third Quartile, 75%) 75% of data falls below this value Maximum (Upper Whisker) Largest data point within 1.5 × IQR from Q3 Interquartile Range (IQR) Q3 − Q1 (middle 50% of data) Outliers Points beyond 1.5 × IQR from Q1 or Q3 This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Box-and-Whisker Plot (Boxplot ) Interpretation of Boxplots: 1. Symmetric Distribution: o Median is centered in the box. o Whiskers are of roughly equal length. o Suggests a normal or balanced distribution. 2. Right-Skewed Distribution (Positively Skewed): o Median is closer to Q1. o Upper whisker is longer. o Suggests more high-value outliers . This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Box-and-Whisker Plot (Boxplot ) Interpretation of Boxplots: 3 . Left-Skewed Distribution (Negatively Skewed): o Median is closer to Q3. o Lower whisker is longer. o Suggests more low-value outliers. 4. Presence of Outliers: o Individual dots outside the whiskers indicate extreme values. o Could suggest measurement errors or significant variations in data . Key Takeaways: ✅ Boxplots are great for comparing multiple distributions side by side. ✅ They quickly reveal outliers, skewness, and spread in data. ✅ Unlike histograms, boxplots do not show exact frequency distribution. ✅ Violin plots offer a more detailed alternative to boxplots by displaying density .
Box-and-Whisker Plot (Boxplot ) Comparison: Boxplot vs. Other Charts Feature Boxplot Histogram Violin Plot Bar Chart Data Type Numerical Numerical Numerical Categorical Shows Distribution? Yes Yes Yes No Shows Outliers? Yes Limited Yes No Shows Exact Frequency? No Yes No Yes Best For? Comparing distributions, spotting outliers Understanding frequency distribution Detailed shape of distribution Comparing categorical data This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Scatterplot Presented By : This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Scatterplot Key Facts: 1. Definition: A scatterplot is a graph that displays individual data points using dots to show relationships between two numerical variables. 2. Purpose: Used to identify patterns, trends, correlations, and potential outliers in data. 3. Axes: o X-axis: Independent variable. o Y-axis: Dependent variable. 4. Key Features: o Reveals correlations (positive, negative, or none). o Helps detect outliers. o Shows clusters or gaps in data.
Scatterplot This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Scatterplot Cheat Sheet : Feature Description Positive Correlation As X increases, Y increases (upward trend). Negative Correlation As X increases, Y decreases (downward trend). No Correlation No clear pattern in the data points. Strong Correlation Points are closely clustered around a trend. Weak Correlation Points are widely spread but still follow a trend. Outliers Points that are far from the general trend. Clusters Groups of points indicating subgroups in the data. This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Scatterplot Interpretation of Scatterplots: 1. Strong Positive Correlation: o Points form a tight upward trend. o Example: Height vs. weight. 2. Strong Negative Correlation: o Points form a tight downward trend. o Example: Age of a car vs. resale value. 3. Weak Correlation (Positive or Negative): o Points loosely follow a trend but are widely spread. o Example: Studying hours vs. test scores (if other factors influence performance ). This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Scatterplot Interpretation of Scatterplots: 4 . No Correlation: o Points are randomly scattered with no trend. o Example: Shoe size vs. intelligence. 5. Outliers Present: o Some points are far away from the general pattern. o Example: A single student with extremely high or low test scores. 6. Clusters in Data: o Indicates subgroups or different categories in the dataset. o Example: Income vs. age showing different groups for students, professionals, and retirees. This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Scatterplot Comparison: Scatterplot vs. Other Charts Feature Scatterplot Line Chart Bar Chart Bubble Chart Data Type Numerical vs. Numerical Numerical vs. Numerical (Trend Over Time) Categorical vs. Numerical Numerical vs. Numerical (with extra variable) Shows Relationship? Yes Yes (over time) No Yes Shows Trends? Yes Yes No Yes Shows Outliers? Yes Limited No Yes Best For? Exploring relationships & correlations Time-series trends Comparing categories Comparing relationships with an extra dimension This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Scatterplot Interpretation of Scatterplots: 4 . No Correlation: o Points are randomly scattered with no trend. o Example: Shoe size vs. intelligence. 5. Outliers Present: o Some points are far away from the general pattern. o Example: A single student with extremely high or low test scores. 6. Clusters in Data: o Indicates subgroups or different categories in the dataset. o Example: Income vs. age showing different groups for students, professionals, and retirees. This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Correlation in Financial Markets Presented By : This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Correlation in Financial Markets Key Facts 1. Definition: Correlation measures the statistical relationship between two financial assets, indicating how they move relative to each other. 2. Correlation Coefficient ( rrr ): o Ranges from -1 to +1. o +1: Perfect positive correlation (assets move in the same direction). o 0: No correlation (assets move independently). o -1: Perfect negative correlation (assets move in opposite directions). This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Correlation in Financial Markets Key Facts 3. Types of Correlation in Finance: o Positive Correlation: Stocks in the same sector (e.g., Apple & Microsoft). o Negative Correlation: Stocks vs. safe-haven assets (e.g., S&P 500 & Gold). o Zero Correlation: Unrelated assets (e.g., Bitcoin & oil prices). 4. Importance in Investing: o Helps with portfolio diversification. o Identifies hedging opportunities. o Assists in risk management. 5. Commonly Used in: o Stocks vs. Bonds: Typically negatively correlated. o Cryptocurrency & Stocks: Often weak correlation but varies in crises. o Commodities vs. Equities: Gold often negatively correlates with equities .
Correlation in Financial Markets Cheat Sheet Correlation Type rrr Value Range Meaning Example Perfect Positive R =1.0 Move in the same direction Nasdaq & S&P 500 Strong Positive R = 0.70 to 1.0 Mostly move together Oil & Energy Stocks Moderate Positive R= 0.40 to 0.70 Some relationship USD & U.S. Treasury Bonds Weak Positive R = 0.10 to 0.40 Limited connection Real Estate & Stocks No Correlation R = 0 No consistent relationship Bitcoin & Natural Gas Weak Negative R = - 0.10 to - 0.40 Limited inverse relationship Tech Stocks & Gold Moderate Negative R = - 0.40 to - 0.70 Often move opposite Stocks & Bonds Strong Negative R = - 0.70 to – 1.0 Almost always inverse USD & Emerging Markets Perfect Negative R = 1.0 Always move in opposite directions VIX (Volatility Index) & S&P 500
Correlation in Financial Markets Interpretation of Correlation in Financial Markets 1. High Positive Correlation ( r>0.7): o Assets move together; not good for diversification. o Example: Tech stocks (Apple & Google). 2. Moderate Positive Correlation (R = 0.40 to 0.70): o Partial dependence; still some diversification benefits. o Example: Crude oil & energy sector stocks. 3. Near Zero Correlation (R = 0): o No predictable relationship; good for diversification. o Example: Bitcoin & S&P 500 (historically, but fluctuates over time ). This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Correlation in Financial Markets Interpretation of Correlation in Financial Markets 4 . Moderate Negative Correlation (R = −0.4 to −0.7): o Helps hedge against losses. o Example: Stocks & Bonds in a normal market . 5. Strong Negative Correlation (R Below – 0.70): o Ideal for risk management and hedging strategies. o Example: VIX (Volatility Index) & Stock Market—VIX rises when stocks fall. This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Correlation in Financial Markets Comparison: Correlation vs. Other Financial Metrics . Metric Measures Range Best For Correlation (rrr) Relationship between asset movements -1 to +1 Portfolio Diversification Beta ( β) Sensitivity to the overall market Any value Risk & Volatility Volatility Price fluctuations over time 0 to ∞ Risk Management Sharpe Ratio Risk-adjusted returns Any value Portfolio Efficiency Covariance Direction of movement, not strength Any value Initial Relationship Analysis This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Correlation in Financial Markets Key Takeaways ✅ Correlation helps investors balance portfolios by combining assets that behave differently. ✅ Negative correlation assets reduce risk (e.g., bonds & stocks). ✅ High correlation limits diversification, increasing vulnerability to market downturns. ✅ Correlation changes over time, especially during financial crises. ✅ Understanding correlation is essential for risk management and asset allocation. This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Correlation vs. Causation Presented By : This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Correlation vs. Causation Key Facts 1. Correlation: Measures the statistical relationship between two variables (how they move together). 2. Causation: Indicates that one variable directly affects the other. 3. Key Difference: Correlation does not imply causation. Just because two variables move together does not mean one causes the other. 4. Examples in Finance: o Correlation: Stock prices and interest rates may move together but are influenced by external factors. o Causation: A central bank’s interest rate hike directly affects loan costs, causing businesses to borrow less. 5. Spurious Correlation: When two variables appear related but are actually influenced by an unrelated third factor. o Example: Ice cream sales and shark attacks increase together (both caused by hot weather).
Correlation vs. Causation Interpretation: How to Distinguish Correlation from Causation 1. Observe the Data Relationship: o Strong correlation does not automatically imply one variable is driving the other. 2. Look for a Logical Explanation: o Does a clear mechanism explain why one variable influences the other? 3. Check for Confounding Variables: o Is there a third variable affecting both? o Example: A rise in stock market & luxury car sales—driven by economic growth, not direct causation. 4. Use Time-Series Data: o If changes in A always precede changes in B, causation is more likely. 5. Conduct Controlled Experiments: o In non-financial fields (medicine, science), controlled experiments confirm causality.
Correlation vs. Causation Cheat Sheet: Correlation vs. Causation Aspect Correlation Causation Definition Measures relationship between two variables One variable directly causes the other to change Direction of Influence No direction (A → B or B → A or both) One variable influences the other (A → B) Proven Relationship? No, just association Yes, direct cause-effect Example in Finance Stock market & oil prices moving together Interest rate hike causing lower loan demand Example in Health People who exercise more tend to weigh less Eating excess calories causes weight gain Can Be Spurious? Yes No Proven By? Statistical analysis (correlation coefficient) Experiments, controlled studies, logical reasoning This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Correlation vs. Causation Comparison : Correlation vs. Causation vs. Coincidence Feature Correlation Causation Coincidence Relationship Type Statistical association Direct cause-effect Random occurrence Example Stock market & GDP growth move together Interest rate cuts lead to more borrowing Number of movies featuring cats & stock market returns both rise Proof Needed? Statistical correlation coefficient Logical explanation, experiments No pattern or link Common Mistake? Assuming one causes the other Ignoring correlation Believing unrelated events are connected This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Linear Regression Presented By : This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Linear Regression Key Facts Definition : Linear regression is a statistical method for modeling the relationship between a dependent variable (YYY) and one or more independent variables (XXX ). 2. Equation of Simple Linear Regression: 3. Types of Linear Regression: o Simple Linear Regression: One independent variable. o Multiple Linear Regression: Multiple independent variables .
Linear Regression 4. Assumptions: o Linearity: Relationship between XXX and YYY is linear. o Independence: Data points are independent. o Homoscedasticity: Variance of residuals is constant. o No Multicollinearity: Independent variables in multiple regression are not highly correlated. o Normality of Residuals: Errors follow a normal distribution. 5. Applications: o Finance: Stock price prediction, risk modeling. o Economics: Demand forecasting, GDP estimation. o Marketing: Sales forecasting, customer behavior analysis. o Healthcare: Disease progression modeling. This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Linear Regression
Linear Regression This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Linear Regression Cheat Sheet Term Description Intercept Value of YYY when X=0X = 0X=0 Slope How much YYY changes per unit increase in XXX R-Squared Goodness of fit (how well the model explains variance in YYY) Adjusted Adjusted for number of predictors in multiple regression P-value Tests statistical significance of predictors (typically < 0.05) Residuals Differences between actual and predicted values Multicollinearity High correlation among independent variables (causes instability) Overfitting Model learns noise instead of real trends (happens with too many predictors)
Linear Regression Comparison: Linear Regression vs. Other Models Feature Linear Regression Logistic Regression Decision Tree Neural Network Output Type Continuous (numeric) Binary/Categorical Discrete or continuous Discrete or continuous Relationship Linear Non-linear Non-linear Complex patterns Interpretability High Moderate Low Very Low Computational Cost Low Low Medium High Handles Outliers Well? No No Yes Yes Handles Multicollinearity Well? No Yes Yes Yes This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Linear Regression Interpretation of Linear Regression Outputs 1. Slope o If positive, YYY increases as XXX increases. o If negative, YYY decreases as XXX increases. 2. Intercept o The expected value of YYY when X=0X = 0X=0. o Sometimes not meaningful (e.g., predicting salary when years of experience = 0). 3. Coefficient of Determination – R Square o Measures how well the independent variable(s) explain the variance in YYY. o R2=1R^2 = 1R2=1 → Perfect fit (rare in real-world data). o R2=0R^2 = 0R2=0 → No relationship . This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Linear Regression Interpretation of Linear Regression Outputs o Rule of thumb: R2>0.7 → Strong fit. R2 = 0.30 to 0.70. → Moderate fit. R2<0.3 → Weak fit. 4. P-Value o If p < 0.05, the predictor is statistically significant. o If p > 0.05, the predictor may not be meaningful. 5. Residuals & Homoscedasticity o Residuals should be randomly distributed. o A funnel shape suggests heteroscedasticity (violates assumptions).
Linear Regression Key Takeaways ✅ Linear regression is a powerful, interpretable model for predicting numerical values. ✅ Best used when variables have a linear relationship and assumptions hold. ✅ Multiple regression extends it to multiple predictors but requires checking for multicollinearity . ✅ Compared to non-linear models, it is computationally efficient but may not capture complex relationships. This Content is Copyright Reserved Rights Copyright 2025@PTAIndia
Next Chapter 3 - Introduction to Probability Section 6 – Statistics Analysis Presented By : This Content is Copyright Reserved Rights Copyright 2025@PTAIndia