Data science and analytics, computer science

abishakathiresan1712 19 views 11 slides Aug 23, 2024
Slide 1
Slide 1 of 11
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11

About This Presentation

Map reduce and Hadoop, analytics for unstructured data


Slide Content

Data Science & Analytics Submitted by k.Abisha II Msc cs Nadar Saraswathi college of arts and science

Map Reduce & Hadoop Map Reduce: A programming model for processing large data sets in parallel Breaks down data into smaller chunks(splitting) Processes data in parallel(mapping) Combines output(reduce) Can be implemented on various platforms, including Hadoop. Map: Breaks down the input data into smaller chunks, processes them in parallel, and produces key-value pairs as output. Reduce: Aggregates the output from the map phase, combines the key-value pairs, and produces the final output

Hadoop: An open-source framework that implements the Map Reduce model. Distributed storage (HDFS) and processing (Map Reduce engine). Scalable, flexible, and fault-tolerant. Includes additional components like YARN, Pig, Hive and Hbase . Hadoop Distributed File System (HDFS): Stores data across the cluster, providing high availability and scalability. Map Reduce engine: Executes the Map Reduce programming model on the Hadoop cluster. YARN(Yet Another Resource Negotiator): Manages resources and schedules jobs (including Map Reduce) on the cluster.

Key benefits: Scalability: Handles large data sets by distributing processing across the cluster. Flexibility: Supports various data formats and processing tasks. Fault tolerance: Automatically handles node failures and re-executes tasks as needed. Cost-effective: Uses commodity hardware and open-source software

Key differences: Map Reduce is a programming model, while Hadoop is a framework that implements it. Map Reduce can be used outside of Hadoop, while Hadoop is tightly coupled with Map Reduce Hadoop provides additional features like distributed storage, resource management, and supporting tools. Map Reduce is a programming model for parallel processing, while Hadoop is a framework that implements Map Reduce and provides additional features for big data processing.

Analytics for unstructured data Unstructured data, such as text, images, and videos, requires specialized analytics techniques in data science and analytics. Here are some key aspects: Challenges: Lack of predefined schema or format. High dimensionality and volume. Difficulty in extracting insights

Techniques: Text Analytics: Natural Language processing(NLP) Sentiment Analysis Topic Modeling Named Entity Recognition Image and Video Analytics: Computer vision Object Detection Facial Recognition Image Classification

Audio Analytics: Speech Recognition Music Information Retrieval Audio Classification Application: Customer Insights: Sentiment analysis for social media monitoring. Customer feedback analysis. Content Analysis: Image classification for self-driving cars. Video analysis for surveillance Predictive Maintenance: Audio analysis for equipment failure detection.

Thank you!!!