Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
We share our slides about Apache Tez delivered as a lightening talk given at Warsaw Hadoop User Group http://www.meetup.com/warsaw-hug/events/218579675
Hive on
MapReduce on
Avro
Hive on Tez on
Avro
Hive on Tez on ORC
Snappy
Plan
3 MapReduce
jobs
Map => Map =>
Reduce => Reduce
=> Reduce
Map => Map =>
Reduce => Reduce
=> Reduce
Wallclock
Time (sec)
636 268 203
Improvement 2.4x 3.1x
SELECT user_id, count(*) AS cnt
FROM stream
JOIN user ON stream.user_id = user.id
JOIN track ON stream.track_id = track.id
WHERE ...
GROUP BY user_id
ORDER BY cnt DESC
LIMIT 1
Hive on
MapReduce on
ORC ZLIB
Hive on Tez on
ORC ZLIB
Hive on Tez on
ORC Snappy
Plan 6 MapReduce jobs
Map => Map =>
Map => Reduce =>
Reduce
Map => Map =>
Map => Reduce =>
Reduce
Wallclock
Time (sec)
519 259 209
Improvement 2x 2.5x