WEATHER PATTERN ANALYSIS AND PREDICTION NAME: SUBMISSION DATE:
IMPLEMENT K-NN ALGORITHMS CONDUCT AN EXPLANATORY DATA ANALYSIS IMPLEMENT K-MEANS CLUSTERING AIM : NUMBER OF RECORDS IN THE DATASET: 730 TOTAL NUMBER OF FEATURES IN THE DATASET: 7 KEY COLUMNS INCLUDED :- WEATHER CONDITION DEW POINT HUMIDITY PRESSURE TEMPERATURE VISIBILITY WIND DIRECTION TOOL USED FOR ANALYSIS: MICROSOFT EXCEL
EXPLANATORY DATA ANALYSIS(EDA):
MAXIMUM AND MINIMUM VALUES OF THE FEATURES
AVERAGE AND MEDIAN VALUES
VARYING WEATHER TRENDS ACROSS MONTHS
METHODOLOGY K-NN CLASSIFICATION FOR RAIN PRESENCE : MOST PREDICTIONS CLASSIFIED AS 0(NO RAIN). MODEL DOESN’T SUPPORT 1(RAIN) DUE TO IMBALANCE IN DATSET
K-MEANS CLUSTERING : EACH DATA POINT IS COLOUR CODED BASED ON ITS CLUSTER TEMPERATURE. HUMIDITY AND PRESSURE INFLUENCE CLUSTER FORMATION
RESULTS K-NN CLASSIFICATION RESULTS : PERFORMANCE: The model achieved 98% accuracy but was unable to predict "Rain". There were only 2 instances of rain in the dataset The F1-score for "Rain" is zero due to the imbalance in the dataset. KEY INSIGHTS: The classifier is highly effective for "No Rain" predictions but fails to generalise for "Rain" due to insufficient data for rainy conditions.
K-MEANS CLUSTERING RESULTS : CLUSTER FORMATION: The data was grouped into three clusters based on Temperature (°C), Humidity (%), and Pressure ( hPa ). It effectively segmented the weather data into meaningful groups . The clusters represent distinct weather patterns: One cluster may represent high humidity and low temperature. Another cluster may represent moderate weather conditions. The third cluster may represent dry conditions with low humidity and higher temperatures.
INSIGHTS AND LEARNINGS: The region experiences predominantly warm temperatures and dry conditions, with occasional high humidity and rainfall events. Visibility issues are common, possibly due to factors like smoke, haze, or heavy rain. Rain is relatively infrequent, but when it occurs, it significantly impacts humidity and visibility. Temperature and Dew Point variations suggest seasonal patterns. Lower temperatures and dew points might correspond to cooler months, while higher values may indicate summer months. Rain Presence, Humidity, and Visibility likely show strong seasonal trends, with rainier periods correlating with higher humidity and lower visibility. K-NN Classification and K-means Clustering along with Orange helped in analysis and problem solving
CHALLENGES FACED IN COMPUTATION: COPYING THE DATA AND CREATING CHARTS AND GRAPHS WAS CHALLENGING . APPLYING CLASSIFICATIONS IN ORANGE WAS A DIFFICULT TASK. FINDING ANSWERS BY APPLYING IN EXCEL REQUIRED THE MOST TIME. RECOMMENDATIONS FOR FUTURE PROJECTS: THE EXPERIENCE THAT WAS GAINED FROM THIS PROJECT WILL BE VERY USEFUL. BUT, THE PROJECT WAS A BIT COMPLEX AND LENGTHY TO COMPLETE. OVERALL THIS WAS A VERY GOOD LEARNING EXPERIENCE .
CONCLUSION MAIN FINDINGS OF THE PROJECT : Distinct Weather Clusters. Weather conditions influencing visibility Dynamics of Temperature and Humidity USAGE OF AI AND DS IN SIMILAR PROBLEMS: Predictive analysis. Clustering and pattern discovery. Decision support systems.
BROADER IMPLICATION OF THE RESULTS : Climate Monitoring And Environmental Health Agricultural Applications Urban Planning And Infrastructure. 4. Disaster Preparedness And Management 5. Public Awareness And Policy Making
REFERENCES USED FOR THE PROJECT TOOLS AND SOFTWARE USED : GOOGLE CHROME MICROSOFT EXCEL MICROSOFT POWERPOINT ORANGE CHAT GPT WEBSITES REFERRED : IIT-M DATA SCIENCE AND AI COURSE VIDEOS GeeksForGeeks Website