Introduction of Apache Mahout , Algorithm of Mahout, comparison between Mahout and R
Size: 202.26 KB
Language: en
Added: Mar 08, 2016
Slides: 11 pages
Slide Content
Apache Mahout By : Puneet Gupta M.Tech (Future Studies and Planning)
A mahout is one who drives an elephant as its master Its close association with Apache Hadoop which uses an elephant as its logo . Apache Mahout started as a sub-project of Apache’s Lucene in 2008. In 2010, Mahout became a top level project of Apache. Apache Mahout ?
Apache Mahout ? Apache Mahout is an open source project Mahout is a Java library - Implementing Machine Learning techniques • Recommendation • Clustering • Classification
What can we do? • Currently Mahout supports mainly three use cases: – Recommendation - takes users' behavior and from that tries to find items users might like. – Clustering - takes e.g. text documents and groups them into groups of topically related documents. – Classification - learns from existing categorized documents what documents of a specific category look like and is able to assign unlabeled documents to the (hopefully) correct category.
Why Mahout? Mahout is not the only Machine Learning framework – Weka – R Why do we prefer Mahout? – Apache License – Good Community – Good Documentation – Scalable • Based on Hadoop (not mandatory!)
BIG DATA Why do need a scalable framework?
Algorithms • Recommendation – User-based Collaborative Filtering – Item-based Collaborative Filtering – Slope One Recommenders – Singular Value Decomposition