Term_Project_-_Forecasting_and_Predictive_Analytics.pptx

RanaAsh1 4 views 77 slides Aug 24, 2024
Slide 1
Slide 1 of 77
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77

About This Presentation

This project aims to study the relationship between weather conditions and aerial bombing operations during World War II.


Slide Content

Weather and WWII Air Mission Analysis T he relationship between weather conditions and aerial bombing missions in WWII 1939 1945 Made By Rana Ashraf Mohamed Term Project - Forecasting and Predictive Analytics Supervised By Dr. Ghada Tolan Eng. Ahmed Fouad

T able of contents 01 05 04 02 03 Project Brief Data Visualization Time Series Analysis of USA and BURMA war Data Overview Data Cleaning 06 Summary

Project Brief 01 The battle of Britain

Project Brief This project aims to study the relationship between weather conditions and aerial bombing operations during World War II. By integrating two datasets — one detailing the daily weather conditions at various global stations and the other documenting aerial bombing missions — this analysis aims to uncover how meteorological factors influenced wartime air operations from 1939 to 1945 . Weather Dataset: This dataset provides detailed records of weather conditions during World War II. It includes data on precipitation, snowfall, temperatures, wind speeds, and the presence of thunderstorms or other adverse weather conditions. Bombing Operations Dataset: This dataset records the specifics of aerial bombing missions conducted by the U.S., Royal Air Force, and other Allied air forces. It encompasses data on the date, location, and details of each mission.

02 Data Overview

Data Overview Data Loading and Inspection

Data Overview Data Loading and Inspection

Data Overview Data Loading and Inspection

Data Overview Data Information Dataset Feature Description Summary of Weather MaxTemp Maximum temperature in degrees Celsius MinTemp Minimum temperature in degrees Celsius MeanTemp Amount of snowfall in millimeters PoorWeather Tracks severe weather events

Data Overview Data Information Dataset Feature Description Summary of Weather YR Year of observation MO Month of observation DA Day of observation PRCP Total precipitation in millimeters

Data Overview Data Information Dataset Feature Description Summary of Weather DR Duration of precipitation in hours SPD Wind speed in kilometers per hour MAX Maximum wind speed in kilometers per hour MIN Minimum wind speed in kilometers per hour

Data Overview Data Information Dataset Feature Description Summary of Weather MEA Mean wind speed in kilometers per hour Snowfall Snowfall occurrence SND Snow depth in millimeters FT Frost occurrence

Data Overview Data Information Dataset Feature Description Summary of Weather FB Fog occurrence FTI Freezing rain or ice occurrence ITH Indicator of thunderstorm occurrence PGT Peak gust time

Data Overview Data Information Dataset Feature Description Summary of Weather TSHDSBRSGF Code for types of poor weather conditions SD3 Snow depth at 3 p.m RHX Relative humidity maximum percentage RHN Relative humidity minimum percentage

Data Overview Data Information Dataset Feature Description Summary of Weather RVG Visibility in meters WTE Encodes specific weather types not covered by other columns

Data Overview Data Information Dataset Feature Description Location WBAN Weather Bureau Army Navy (WBAN) Identifier Usage: A unique numeric code assigned to a weather station used by various meteorological agencies. NAME The name of the weather station

Data Overview Data Information Dataset Feature Description Location STATE/COUNTRY ID An abbreviation or code for the state or country where the station is located Lat A string representing latitude in degrees and minutes with cardinal direction

Data Overview Data Information Dataset Feature Description Location LON A string representing longitude in degrees and minutes with cardinal direction Elevation The height of the station's location above sea level

Data Overview Data Information Dataset Feature Description Location Latitude A numeric value in decimal degrees representing latitude Longitude A numeric value in decimal degrees representing longitude

Data Overview Data Information Dataset Feature Description Operations Mission ID Unique Identifier for the Mission Mission Date Date of the Mission Theater of Operations The area where the mission was conducted Country Country of the Operation

Data Overview Data Information Dataset Feature Description Operations Air Force The Air Force branch involved in the mission Unit ID The unit within the Air Force carrying out the mission Aircraft Series The model of the aircraft used in the mission Callsign Radio Callsign

Data Overview Data Information Dataset Feature Description Operations Mission Type Describes the nature of the mission (e.g., bombing, reconnaissance) Takeoff Base Base of Departure Takeoff Location Location of Departure Takeoff Latitude Latitude of Takeoff Location in decimal

Data Overview Data Information Dataset Feature Description Operations Takeoff Longitude Longitude of Takeoff Location in decimal Target ID Unique numeric code assigned to the mission target Target Country Country of Target Target City City of Target

Data Overview Data Information Dataset Feature Description Operations Target Type The specific type of the target (e.g., military base, industrial site) Target Industry The industry related to the target (e.g., manufacturing) Target Priority T he strategic priority of the target Target Latitude Latitude of Target Location in decimal

Data Overview Data Information Dataset Feature Description Operations Target Longitude Longitude of Target Location in decimal Altitude Flight Altitude in Hundreds of Feet Airborne Aircraft Number of Aircraft Airborne Attacking Aircraft Number of Aircraft Attacking

Data Overview Data Information Dataset Feature Description Operations Bombing Aircraft The number of aircraft that dropped bombs on the target Aircraft Returned Number of Aircraft Returned Aircraft Failed Number of Aircraft Failed Aircraft Damaged Number of Aircraft Damaged

Data Overview Data Information Dataset Feature Description Operations Aircraft Lost Number of Aircraft Lost High Explosives The total weight of high explosives used in the mission High Explosives Type Type of High Explosives

Data Overview Data Information Dataset Feature Description Operations High Explosives Weight The weight of high explosives in pounds Incendiary Devices The total quantity of incendiary devices Fragmentation Devices Type Type of Fragmentation Devices Fragmentation Devices Weight The weight of fragmentation devices in pound

Data Overview Data Information Dataset Feature Description Operations Total Weight The total weight of all ordnance (high explosives, incendiary, and fragmentation devices) Time Over Target The duration the mission was conducted over the target area Bomb Damage Assessment Evaluation of the effectiveness of the bombing

May 7, 1945 03 Data Cleaning

Data Cleaning Null Values

Data Cleaning Null Values

Data Cleaning Null Values

Data Cleaning Null Values Handling We’ll Start by removing columns with more than 50% of null entries .

Data Cleaning Null Values Handling

Data Cleaning Null Values Handling

Data Cleaning Null Values Handling - Less than 20% In handling missing data, an effective strategy involves distinguishing between columns based on the percentage of null values they contain. For columns where the proportion of missing values is less than 20%, the approach depends on the data type of the column. Numerical columns : Imputed using the mean of existing values, ensuring that the central tendency of the data is preserved. This method is suitable as it minimizes distortion in the distribution of numerical features. Categorical columns : Imputed using the mode—the most frequent value observed in the column. This approach ensures that categorical integrity is maintained by replacing missing values with the category that appears most frequently.

Data Cleaning Null Values Handling - Less than 20%

Data Cleaning Null Values Handling - Less than 20%

Data Cleaning Null Values Handling - Greater than 20% Dropping Irrelevant Columns : D ecided to drop columns that are not so helpful like “Target ID”. Dropping Overlapping Columns : R emoved “Target Industry” because it overlaps with “Target Type”. “Country” Column : I observed a pattern of letter abbreviations that correlate between the 'Country' column and the 'Air Force' column.

Data Cleaning Null Values Handling - Greater than 20% Dropping Irrelevant Columns : Decided to drop columns that are not so helpful like “Target ID” and ”Target Priority”. Dropping Overlapping Columns : Removed “Target Industry” because it overlaps with “Target Type”. “Country” Column : I observed a pattern of letter abbreviations that correlate between the 'Country' column and the 'Air Force' column. However there’s still a column with null values in both 'Country' column and the 'Air Force' column.

Data Cleaning Null Values Handling - Greater than 20% “Country” Column

Data Cleaning Null Values Handling - Greater than 20% “Country” Column : I filled missing values in the 'Air Force' column based on the 'Country' column and vice versa.

Data Cleaning Null Values Handling - Greater than 20% “Altitude” and “Attacking Aircraft” Columns : Couldn’t find any correlation, so i decided to fill them with mean value.

Data Cleaning Null Values Handling - Greater than 20% “Country” Column : I filled missing values in the 'Air Force' column based on the 'Country' column and vice versa.

Data Cleaning Removing Redundant Columns

Data Cleaning Merging Dataframes

Data Cleaning Merging Dataframes

July 20, 1944 04 Data Visualization

Data Visualization

Data Visualization

Data Visualization

Data Visualization

Data Visualization

Data Visualization

Data Visualization

Data Visualization

Data Visualization

Data Visualization

05 Time Series Analysis of USA and BURMA war

Time Series Analysis of USA and BURMA war What is a Time Series? A time series refers to a sequence of data points collected at regular time intervals. It is inherently time-dependent and is used to analyze trends and patterns over time. Many time series exhibit seasonal trends, where certain patterns repeat at specific intervals. In contrast, some time series do not exhibit seasonality. To determine if a time series is stationary, three basic criteria are considered: Constant Mean: The average value of the time series remains consistent over time. Constant Variance: The variance (spread or dispersion) of the data points should not change significantly across different time periods. Autocovariance: The covariance between the time series and its lagged versions should not be dependent on time.

Time Series Analysis of USA and BURMA war I decided to analyze the time series data related to the USA and Burma conflict because it spans the longest period in the dataset. During this war, the USA conducted bombing missions on Kathalu city in Burma from 1942 to 1945 .

Time Series Analysis of USA and BURMA war As observed in the previous plots, our time series exhibits seasonal variation. Specifically, during summer, the mean temperature tends to be higher, whereas in winter, it generally decreases annually. Now, let's assess the stationarity of the time series using the following methods: Plotting Rolling Statistics: We apply a rolling window to compute rolling mean and variance. This helps us visually inspect whether the statistical properties such as mean and variance remain constant over time, indicating stationarity. Augmented Dickey-Fuller Test: This test provides a statistical measure that includes a Test Statistic and Critical Values at different confidence levels. If the computed Test Statistic is less than the Critical Value, we can conclude that the time series is stationary.

Time Series Analysis of USA and BURMA war Plotting Rolling Statistics

Time Series Analysis of USA and BURMA war Augmented Dickey-Fuller Test The test statistic is -1.4, while the critical values at 1%, 5%, and 10% levels are -3.439, -2.865, and -2.569 respectively. Since the test statistic exceeds these critical values, we conclude that the time series is not stationary.

Time Series Analysis of USA and BURMA war As previously discussed, non-stationarity in time series can arise from two main factors: Trend: Variations in the mean over time. Achieving a constant mean is essential for stationarity. Seasonality: Periodic variations at specific times. Consistent variation is necessary for stationarity. To address the trend and achieve a constant mean: Differencing method: It is one of the most common method. Idea is to take a difference between time series and shifted time series.

Time Series Analysis of USA and BURMA war Differencing

Time Series Analysis of USA and BURMA war Differencing F rom the previous plot we can observe that the mean and variance appear consistent. The test statistic is less than the critical value at the 1% significance level, indicating that we can assert with 99% confidence that this series is stationary.

Time Series Analysis of USA and BURMA war Forecasting a Time Series Our chosen forecasting method is ARIMA ( Autoregressive Integrated Moving Averages). AR (Auto-Regressive): Utilizes lagged values of the dependent variable (p). For instance, if p=3, we use x(t-1), x(t-2), and x(t-3) to predict x(t). I (Integrated): Represents the number of non seasonal differences (d). In our case, we applied first-order differencing, hence d=0. MA (Moving Averages): Involves lagged forecast errors in the prediction equation (q).

Time Series Analysis of USA and BURMA war Forecasting a Time Series The parameters (p, d, q) define the ARIMA model, To determine the appropriate values for p and q, we will rely on two key plots: Autocorrelation Function (ACF) : Measures the correlation between a time series and its lagged versions. Partial Autocorrelation Function (PACF) : Indicates the correlation between a time series and its lagged versions, after removing correlations already explained by earlier lags.

Time Series Analysis of USA and BURMA war Forecasting a Time Series

Time Series Analysis of USA and BURMA war Forecasting a Time Series W e will employ (1,0,1) as the parameters for our ARIMA model and proceed with the prediction: Choosing p: The lag value where the PACF chart first intersects the upper confidence interval. For our case, p=1. Choosing q: The lag value where the ACF chart initially intersects the upper confidence interval. For our case, q=1.

Time Series Analysis of USA and BURMA war Forecasting a Time Series

Time Series Analysis of USA and BURMA war Forecasting a Time Series

06 Summary

Summary During our study, we focused on analyzing time series data related to military operations and weather conditions. We ensured the data was clean and organized for thorough analysis. Our initial exploration involved creating visualizations to understand attack frequencies, geographic patterns, and seasonal changes in mean temperatures. We then applied specific techniques to address time series characteristics, such as checking for stationarity and using differencing methods. Employing ARIMA modeling, we manually selected parameters to forecast mean temperatures, which allowed us to predict future trends based on historical data. This approach not only revealed historical insights but also provided valuable tools for making informed decisions across different fields, showcasing the utility of time series analysis in predictive analytics.

Thanks! D o you have any questions? The Battle of Leyte Gulf
Tags