Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth in Data Science Whilst Migrating to Spark+Cloud all at the Same Time with Matt Fryer
databricks
3,707 views
23 slides
Jun 08, 2017
Slide 1 of 23
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
About This Presentation
In the last year Hotels.com has begun it’s journey to becoming an algorithmic business. Matt will talk about their experiences of exponential growth in Data Science Algorithms whilst at the same time the team have migrated to using Spark as their core underlying architecture from SAS / SQL, migrat...
In the last year Hotels.com has begun it’s journey to becoming an algorithmic business. Matt will talk about their experiences of exponential growth in Data Science Algorithms whilst at the same time the team have migrated to using Spark as their core underlying architecture from SAS / SQL, migrated to the cloud from on-premise are transforming the capability of the data science function. He will also highlight the key enablers that have made this successful including CEO support, the internal concepts of organic intelligence and how Databricks has helped make this happen. He will also highlight the pitfalls on the journey.
Size: 25.25 MB
Language: en
Added: Jun 08, 2017
Slides: 23 pages
Slide Content
Confidential -do not distribute
Hotels.com’sjourney to becoming
an Algorithmic Business
Matthew Fryer
VP, Chief Data Science Officer [email protected]
Confidential -do not distribute
Part of Expedia, Inc. family
385,000 properties
89 countries
39 languages
>27m Hotels.com Rewards Members
Home of Captain Obvious
Billions of Recommendations, based on real-time Data per day
Hotels.com
Confidential -do not distribute
Confidential -do not distribute
Confidential -do not distribute
5
Data Science
Engineering
Front End Development
Confidential -do not distribute
“Artificial Intelligence Will Be
Travel’s Next Big Thing”
Barry Diller
Chairman & Senior Executive,
Expedia, Inc.
3M’s are disruptive
technology
Mobile
Messaging / NLP
Machine Learning
Confidential -do not distribute
Confidential -do not distribute
Our overall ecosystem
Confidential -do not distribute 9
Core Elements of our Data Science Cloud Platform
Databricks Unified Platform
Maestro –Our Internally Developed
Platform on AWS
(EMR, Spark, R-Studio, Intellij, SBT, Jupyter,
Zeppelin, Unit / QA, Metastore, Apache Airflow,
Keras, Tensorflow)
Proof of Concept on Google
Cloud, Beam, Spark &
Tensorflow
Confidential -do not distribute
Databricks Unified Platform
Chart is in 1 hour blocks, y axis = number of 32 core instances
10
•Key asset to the success of data science at Hotels.com
•Key in driving up data scientist productivity / efficiency / flexibility
•Helps make our data science lifecycle operate much easier and
faster driving speed to market
•Reliable / secure + facilitates ‘Highly Elastic’ workflows exploiting
cost effective spot instance on AWS.
Confidential -do not distribute
ALPs –Algorithm Lifecycle Pipeline Service
11
Confidential -do not distribute
Reference: The Influence of Visuals in Online Hotel Research and Booking Behaviour
Images are an important factor while choosing a hotel
12
0% 10% 20% 30% 40% 50% 60% 70% 80%
Loyalty Program
Reviews
Hotel Brand
Star Rating
Destination Info
Images
Hotel Info
Factors other than price/location
Very Imporant/ImportantImportantVery Important
Confidential -do not distribute
Computer Vision problems we try to tackle
13
Near Duplicate Detection
Scene Classification Image Ranking
Confidential -do not distribute 14
Tagged as Bathroom
Confidential -do not distribute 15
GPU’s quickly became key, took a large effort to optimize using
Keras+ Tensorflow (Inception v3 + ResNet)
493
67
20
7
4
1
10
100
1000
12-CPU 1-GPU 1-GPU +
limited cache
16-GPU +
limited cache
16-GPU + full
cache
Days CIFAR2
Expedia Small
15
2.5
0
10
20
16-GPU + full cache Optimized
Days
Confidential -do not distribute
Near Duplicate Detection: Real world examples
16
Non-Duplicates –probability 100%
Non-Duplicates –probability 95.91%
Duplicates –probability 97.98%
Duplicates –probability 98.43%
Confidential -do not distribute
ROOM/BATHROOM
Using the model: Real world examples
17
EXTERIOR/HOTEL INTERIOR/SEATING_LO
BBY
ROOM/LIVING_ROOM
ROOM/GUESTROOM
FACILITIES/DINING
INTERIOR/SEATING_LOBBY
FACILITIES/POOL
Confidential -do not distribute
Accuracy & Confusion Matrix
18
•After many manual / long
winded iterations and
regularization processes
tuning hyperparameters
•We achieved good
accuracy and low
confusion matrix
Confidential -do not distribute
Optimizing the photo order for improved customer
experiences
19
Original Model
Reference: Radisson BluEdwardian Berkshire Hotel, London
Confidential -do not distribute
Finding the right hotel in our marketplace is core to
our customers needs.
Confidential -do not distribute
Kensington
Bloomsbury
Heathrow
Canary
Wharf
Paddington
Westminster
London City
Airport
Chelsea
Battersea
Wimbledon
Wembley
City of
London
As an example different user segments like to stay in
different locations