Fundamentals of Apache Hadoop in Bigdata

AshwinKumarR7 14 views 10 slides May 11, 2024
Slide 1
Slide 1 of 10
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10

About This Presentation

Apache Hadoop is an open-source framework that empowers you to manage big data processing and storage. Hadoop distributes large datasets across clusters of computers, enabling parallel processing for faster results.


Slide Content

Apache Hadoop By Ashwin Kumar R

What is Bigdata? Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size

Why we need Big data analytics? Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher profits and happier customers.

What is hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.

Use of Hadoop ? Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.

What is DFS? A Distributed File System ( DFS ) as the name suggests, is a file system that is distributed on multiple file servers or multiple locations. It allows programs to access or store isolated files as they do with the local ones, allowing programmers to access files from any network or computer.

What is HDFS? HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file .

Advantages Scalable . Hadoop is a highly scalable storage platform, because it can store and distribute very large data sets across hundreds of inexpensive servers that operate in parallel. Cost effective . Hadoop also offers a cost effective storage solution for businesses' exploding data sets. Flexible. Fast. Resilient to failure.

Disadvantages Security Concerns. Vulnerable By Nature. Not Fit for Small Data. Potential Stability Issues. General Limitations.

Thank you !