Introduction to Regression for SECOND year MBA.pptx
nishantsapra3
13 views
30 slides
Sep 15, 2024
Slide 1 of 30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
About This Presentation
Intro to regression
Size: 725.8 KB
Language: en
Added: Sep 15, 2024
Slides: 30 pages
Slide Content
Energy Modelling Using Simple-E Joint SPC-APEC Regional Workshop on Energy Statistics and Modeling for the SDG7 and the COP21 INDC Energy Targets 14-18 March 2016 Nuku’alofa, Kingdom of Tonga MICHAEL SINOCRUZ Asia Pacific Energy Research Centre
What do we want to know? 2 Energy Demand How much will be the energy demand in 20xx or outlook period? Which is the most energy consuming sector? How much oil, or gas or coal will be needed to supply the energy requirements in 20xx outlook period? How much will be the electricity demand for the next number of years? Power generation How many new power plants are needed or what type of power plant will be needed to meet the electricity demand? Is the indigenous supply enough or how much will be imported?
Regression Analysis 3 Regression analysis is concerned with the study of the relationship between one variable called the explained (or dependent) variable and one or more other variables called explanatory (or independent) variables . Reference: Damodar Gujarati, Essentials of Econometrics (second edition), McGraw-Hill Regression Analysis - enables us to describe a straight line that best fits a series of ordered pairs (x, y). The equation for a straight line, known as a linear equation w/c is expressed as: y = a + bx Where y = the dependent variable x = the independent variable a = the y-intercept b = the slope of the regression line
Regression Analysis 4 Regression Line or the Least Square Regression Line = is a mathematical procedure to identify the linear equation that best fits a set of ordered pairs by finding the values of a, the y-intercept and b, the slope. The goal of the least square method is to minimize the total squared error between the values of y (actual value) and ŷ (predicted value). a or the y- intercept = is the point where the line crosses the y-axis b or the slope of the straight line = is the ratio of the rise of the line over the run of the line. If b = 0, it means there is no relationship between the independent and dependent variables
Regression Analysis - Determination Coefficient 5 Y=a* X+b X Y=a*X+b X Determination coefficient = 0.6-0.7 Not fitting Determination coefficient = 0.8-0.9 Fitting Y Y Determination coefficient = (Correlation coefficient) 2
Regression Analysis - Ordinary Least Squares Method 6 Y X (x, y) Error Involves the use of statistical procedures to estimate mathematically the average relationships between the dependent and independent variables The Ordinary Least Squares (OLS) Method minimizes the difference between the observed and the estimated value. Requires that the sum of the square of the errors for the best fit function be at a minimum Y = a + bX Source: Prof. Wali Del Mundo
Regression Analysis - Ordinary Least Squares Method 7
Regression Analysis 8 Age (x) Repair Bill (y) xy x 2 y 2 (vehicle) (Peso) 5.5 1,200 6,600 30.25 1,440,000 10.1 900 9,090 102.01 810,000 3.2 450 1,440 10.24 202,500 4.5 750 3,375 20.25 562,500 2.5 200 500 6.25 40,000 _______ _________ _______ ________ _________ ∑ 25.8 3,500 21,005 169 3,055.000 y = a + bx
Regression Analysis 10 Another way of measuring the strength of a relationship is with the Coefficient of Determination, r 2 . This represents the percentage of the variation n∑xy – (∑x) (∑y) r = ------------------------------------------------------- √ [n ∑x 2 – (∑x) 2 ] [n(∑y 2 ) – (∑y) 2 ] r = 0.634 r 2 = (0.634) 2 = 0.402(100) = 40.2 percent
Regression Analysis 11 In other words, 40.2 percent of the variation in the repair bill is explained by the age of the vehicle. If r 2 = 1, all the variation in y is explained by the variable x. If r 2 = 0, none of the variation in y is explained by the variable x. NOTE : Just because a relationship between two variables is statistically significant doesn’t necessarily mean that a causal relationship truly exists. The mathematical relationship could be due to pure coincidence. Always use your best judgment when making these decisions.
END USE DEMAND 101 (A) Indicators should be calculated at the most disaggregated end-use level possible in order to represent each activity level. Specific energy activity Disaggregation International Standard Industrial Classification (ISIC) End use Energy Industry Sector Industry sub-sector
END USE DEMAND 101 (B) Transport Sector Transport mode Passenger Freight Road Rail Air Water End use Energy Vehicle type (light vehicles i.e. cars, SUVs), by fuel Residential/Commercial Sectors End use by activity (space heating and cooling, appliance use, lighting) End use by equipment type or energy source
Regression Analysis using Simple-E 14 SEE (Simple Econometric Simulation System) is an Add-In application for Microsoft Excel. It exploits all the advantages of the native spreadsheet functions as well as the open interfaces with other Windows applications There are three processes involved in Simple-E, from data input (worksheet) to simulation (worksheet) namely; 1) Model Check; 2) Model Solve, and 3) Simulation. The following diagram shows the basic concept and the relationship between these processes within the three worksheets:
Regression Analysis using Simple-E 15
Introduction to SEE – Main Menu 16 Button to start the Main Menu Button to start the simulation of the whole model. Button to create See working sheets in a new file . Button to create See working sheets in the current file . See working sheets Source: Sichao Kan
17 Introduction to SEE – Data Sheet Define the Code Name for all the variables and input data. Free area Usually we put the comments of the Code Name here. For example, their meanings , units, and the sources where you get the data, etc … Put the Code Name of the dependent variable y here. Pay attention to that the code name should be exactly the same as what you have input in the “data” sheet Input time series here. It ends with the year (or month, day, etc…) till which you want to forecast. Input the data for the variables here. Source: Sichao Kan
18 Free area Usually the comments of the Code Names are put here. For example, their meanings , units, and the sources of the data, etc… “Option Type” includes ①the form of relationship between Y and X1, X2,… (equal, linear (OLS), Double-log, Semi-log, etc…), and ②how you want Y to change with time (Linear trend, Growth trend, etc…) Input the Code Names of the dependent variables here. Pay attention so that the code names should be exactly the same as what have input in the “data” sheet. Introduction to SEE – Data Sheet Build your model on the left half of the “model” sheet. Put the Code Name of the dependent variable y here. Pay attention to that the code names should be exactly the same as what have been input in the “data” sheet
19 lag1.Xj : the value of Xj of one year before; Dummy values : To neglect abnormal value of designated years . In SEE, the dummy value of a certain year is denoted by “ dum.year ” For example: dum.1997 Introduction to SEE – Typical Function Forms Source: Sichao Kan
20 Growth Rate and Elasticity Source: Sichao Kan For double-log function ln(Y) = a + b*ln (X), the slope coefficient ‘b’ is equal to the average elasticity coefficient. This is because Growth Rate Elasticity
21 Source: Sichao Kan Once click the “All through” button in the Main Menu and if there are no bugs in your model, the simulation results (the model outputs) will be displayed in the “simulation” sheet automatically. Equation in original form Introduction to SEE – Simulation Sheet
22 Source: Sichao Kan Introduction to SEE – How good is your estimation? Basic check Symbol (+ or -) of independent variables Check by parameters R-squared T-value DW-value etc … Others Elasticity Trend etc…
23 Source: Sichao Kan Introduction to SEE – Model Sheet Check the fitness of your model on the right half of your model. ① R-squared ③ Durbin-Watson testing value ② T-value Model equation Parameters for testing the fitness of the model Notice: After building the model, go to the “main menu” and click the “All through” button. The equation of the model and the parameters for testing the fitness of the model will be displayed on the right half of the “model”.
24 Parameters for testing the fitness of your model (estimation) ( 1) R R-Squared, 0 Explained variance / Total variance 1, (The larger the better) ( 2) AR Adjusted R-Squared, AR 1, (The larger the better) (3) SD SD = ( e 2 /(n-k))1/2 , e = Residual, n = Sample size, k = No. of independent variables (4) t-value t 2 : Significant 2 t 1 : Admissible to use t 1 : Insignificant (5) DW Durbin Watson Statistics, 1 < DW < 3 DW = 2 : No serial correlation DW : Positive correlation DW 4 : Negative correlation (6) Dh Durbin h Statistics with lag, Dh 2 (7) Rho Coefficient of serial correlation, Rho 1 (8) DF Degree of Freedom, DF > 1 (The lager the better) (9) F F-Statistics, F > 0 (The larger the better) (10) RSS Residual Sum of Square, RSS > 0 (The smaller the better) (11) YX Correlation Coefficient between Y and X’s, YX 1 (12) XX Correlation Coefficient between X’s, XX 0.95
25 Introduction to SEE – How to start model building with SEE? Before start SEE you need to Formulate the question of interest Specify variables Collect Data Then with SEE Input the data in the “ data sheet ” Build your model in the “ model sheet ” Test the fitness of your estimation by checking the parameters on the far right side of the “ model sheet ” The prediction results are given in the “ simulation sheet ” After… You can do any analysis you like with the output data
26 Introduction to SEE – Functions Option Type (Useful) “$LS” or Blank cell -- Simple E. executes regression based on Ordinary Least Square (Regression Analysis). XX $LS YY X1 Type Y YY = a * XX + b
27 Introduction to SEE – Functions “=“ or “$EQ” -- Direct Equation: The variable in “Y” is defined directly by the formula in “X.” “ XX = YY X1 Type Y YY = XX
28 Introduction to SEE – Functions “$DL” – Double Log: Simple E. executes regression after transforming the variables of both sides to log format. XX $DL YY X1 Type Y Log(YY) = a * log(XX) + b
29 Introduction to SEE – Functions $CA— Constant Adjustment: Simple E adjust between regression equation and the latest actual value. Y X Y=a*X+b Y=a*X+b+c
30 Introduction to SEE – Functions $TL— Linear Trend, estimated by serial number $TG— Growth Trend, estimated by average actual growth rate Y X Y=a*X+b 1 2 3 4 5 6 7 8 9 10 11 12 10.0% 10.0% $TG $TL