Titanic Passenger Dataset., IIT PROJECT,DATA SCIENCE AND AI PROJECT
aravindhraj470
8 views
15 slides
Oct 24, 2025
Slide 1 of 15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
About This Presentation
Best IIT DATA SCIENCE PROJECT
Titanic Passenger Dataset., IIT PROJECT,DATA SCIENCE AND AI
Size: 314.09 KB
Language: en
Added: Oct 24, 2025
Slides: 15 pages
Slide Content
Titanic Passenger Data And Survival Prediction NAME: SUBMISSION DATE:
🛳️ Titanic Survival Analysis Project 1. Project Overview The objective of this project is to analyze a simplified Titanic passenger dataset to understand survival patterns. Using Exploratory Data Analysis (EDA), the K-Nearest Neighbors (KNN) classification algorithm, and K-Means clustering, we aim to identify the key factors that influenced survival and group passengers based on their characteristics.
Total Passenger Attendance The approximate breakdown is: Total Passengers: ≈ 1,317 First Class: ≈ 324 Second Class: ≈ 284 Third Class: ≈ 709 Total Crew: ≈ 885 to 913 Total People On Board: ≈ 2,200 to 2,224 (Note: Exact figures can vary slightly between historical sources due to different ways of classifying staff, musicians, and last-minute cancellations.)
Survival Data Overall survival rate: 38% Female survival rate: 74% , Male survival rate: 19% Passengers in 1st class had the highest survival rate ( 62% ) Children (<15 years) had a higher chance of survival (~ 55% ) compared to adults.
Pictorial Representation of Total Survival Rate Survival Rate – 38% Death rate – 62%
Pictorial Representation of Survival Rate By Gender Male Survival Rate -19% Female Survival Rate -74% Others Survival Rate -7%
KNN Classification - Passenger’s Survival Data
K-mean Clustering – Passenger’s Survival Data
KNN Algorithm What it shows : This plot demonstrates how the K-Nearest Neighbors (KNN) algorithm classifies passengers as either survived (blue) or not survived (red) based on their features (like age, sex, class, and fare). Explanation : Each point represents a passenger. The blue and red points show the two survival outcomes: Blue (1): Passenger survived Red (0): Passenger did not survive
The colored background represents the decision boundary formed by the KNN algorithm. It divides the space into two regions: one predicting survival and the other predicting non-survival. When a new passenger is introduced, the model looks at the 5 nearest neighbors (k=5) . If most of them survived, the new passenger is predicted to survive too. ✅ Insight : KNN helps us understand how similar passenger characteristics influenced survival chances. It uses past patterns to make predictions for new or unseen passengers.
K-Means Clustering (k = 3) What it shows : This plot illustrates how K-Means Clustering groups Titanic passengers into 3 clusters based on their characteristics — without using the survival label . Explanation : Each cluster is shown with a different color (e.g., red, orange, gray). The ‘X’ marks represent the centroids (centers) of each cluster.
Passengers grouped in the same cluster have similar traits such as age, class, and fare. ✅ Insight: K-Means reveals natural groupings among passengers. For example: One cluster might represent wealthy adults in 1st class (higher survival). Another might represent young passengers or families . Another could be lower-class passengers with lower fares (lower survival).
📌 Overall Analysis : KNN Classification is a supervised learning method — it predicts survival based on labeled data. K-Means Clustering is an unsupervised learning method — it groups passengers without knowing whether they survived. Together, these techniques give us a deeper understanding of how passenger characteristics influenced survival on the Titanic.
REFERENCES USED FOR THE PROJECT TOOLS AND SOFTWARE USED : GOOGLE CHROME MICROSOFT EXCEL MICROSOFT POWERPOINT ORANGE CHAT GPT WEBSITES REFERRED : IIT-M DATA SCIENCE AND AI COURSE VIDEOS GeeksForGeeks Website