The Titanic - machine learning from disaster

MostafaNizam 1,053 views 35 slides Jun 29, 2019
Slide 1
Slide 1 of 35
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35

About This Presentation

• Historical context to understand "What does the data mean?"
• Learn one data set well, and then apply different algorithms and modelling tools.
• This is a true event and everybody knows about the Titanic.
• Whole information is in the internet and the data is verified.


Slide Content

Assignment on
Advanced Data Mining and Machine
Learning
Course Code: CSE-5208
Jagannath University
Submitted to:
Dr. Md Masbaul Alam Polash
Associate Professor
Dept. of Computer Science & Engineering
Jagannath University
Submitted by:
Mostafa Nizam
M.Scin CSE (2nd Semester)
7
th
Batch
Date of Submission: 28-06-2109

The Titanic: MachineLearning from
Disaster

Titanic: Machine Learning from
Disaster
Why we picked this project:
Historical context to understand "What does the data mean?"
Learn one data set well, and then apply different algorithms and modelling
tools.
This is a true event and everybody knows about the Titanic.
Whole information is in internet and the data is verified.

April 1912
The Titanic Disaster

About Titanic:
The ship sank in the North Atlantic ocean over one hundred years ago.
But almost everybody in the world today knows the name of the
Titanic. in 1912, the Titanic is the biggest ship that was ever built. It is 269
meters long and everyone thinks that the ship is also very safe. The ship
can carry more than 3.000 passengers, and has many decks, a
swimming pool, a library, Turkish baths, and excellent restaurants and
bars. The Titanic leaves Southampton, on the south coast of England,
on April 10, 1912.
First class: 322 passengers
Second class: 275 passengers
Third class: 712 passengers
Crew: 898 passengers

Survived:
First class: 60% survived 130 died
Second class: 42% survived 166 died
Third class: 25% survived 536 died
The crew : 24% survived 685 died

Overview
We take the Data from https://www.kaggle.com/c/titanic/data
We will use R programming language.
The data has been split into two groups:
training set (train.csv)
test set (test.csv)
The training setshould be used to build your machine learning models. For
the training set, we provide the outcome (also known as the “ground
truth”) for each passenger. Your model will be based on “features” like
passengers’ gender and class.
The test setshould be used to see how well your model performs on
unseen data. For the test set, we do not provide the ground truth for each
passenger. It is your job to predict these outcomes. For each passenger in
the test set, use the model you trained to predict whether or not they
survived the sinking of the Titanic.

Data Dictionary
Variable Definition Key
survival Survival 0 = No, 1 = Yes
pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd
sex Sex
Age Age in years
sibsp # of siblings / spouses aboard the Titanic
parch # of parents / children aboard the
Titanic
ticket Ticket number
fare Passenger fare
cabin Cabin number
embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S =
Southampton

Variable Notes
pclass: A proxy for socio-economic status (SES)
1st = Upper
2nd = Middle
3rd = Lower
age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
sibsp: The dataset defines family relations in this way...
Sibling = brother, sister, stepbrother, stepsister
Spouse = husband, wife (mistresses and fiancés were ignored)
parch: The dataset defines family relations in this way...
Parent = mother, father
Child = daughter, son, stepdaughter, stepson
Some children travelled only with a nanny, therefore parch=0 for them.

Install R & R studio

1
st
Prediction
See the data who survived and died. (0=die; 1=survive).
Proportion table in percentage.
Doing hypothesis. Let, all are died and submit to Kaggle.com

View “Train” Data

Data Structure of ‘Train’ Data

See the data who survived and
died. (0=die; 1=survive).

Proportion table in percentage

BarplotData visualization

Doing hypothesis. Let, all are died
and submit to Kaggle.com

Ladder board

Data Visualization
How many survived?
Passengers travelling in different classes
How many survived gender-wise?
Age distribution in the Titanic
How many parents and children were traveling?
What was the fair most people paid for Titanic?

How many survived?

Passengers travelling in different
classes

How many survived gender-wise?

Age distribution in the Titanic

How many parents and children
were traveling?

What was the fair most people
paid for Titanic?

2
nd
Prediction
Summery of Gender
Proportion in Percentage
Let, all female were survive. Then submit the 2
nd
prediction in
Kaggle.com

Summery of Gender

Proportion in Percentage

Submit to Kaggle.com

Ladder Board

How can we know the passenger survived or
died? (With google search proof)

Google Search

View Result

Takeaways
Data Science requires a lot of data engineering before it can
succeed
Domain knowledge is key
This workflow can be applied to most data problems
R Studio is pretty cool too.

Thank you