Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer science),Bon Secours college for women

6,453 views 20 slides Mar 29, 2019
Slide 1
Slide 1 of 20
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20

About This Presentation

Terminologies and its types
In-Memory Analytics
In-Database processing
Symmetric Multiprocessor system(SMP)
Massively Parallel Processing
Difference Between Parallel and Distributed Systems
Shared Nothing Architecture
Advantages of a “ shared nothing Architecture”
CAP Theorem Explained
CAP Theor...


Slide Content

SUBMITTED BY

NAME:G .SUMITHRA
CLASS:II M.SC,COMPUTER SCIENCE
BATCH:2017-2019
INCHARGE STAFF:Ms.M.FLORENCE DAYANA


Terminologies and its types
In-Memory Analytics
In-Database processing
Symmetric Multiprocessor system(SMP)
Massively Parallel Processing
Difference Between Parallel and Distributed Systems
Shared Nothing Architecture
Advantages of a “ shared nothing Architecture”
CAP Theorem Explained
CAP Theorem

contents


In order to good handle on the big data environment.
A few key terminologies in this arena.
The different types of terminologies are:
In-Memory Analytics
In-Database processing
Symmetric Multiprocessor system(SMP)
Massively Parallel Processing
Difference Between Parallel and Distributed Systems
Shared Nothing Architecture
CAP Theorem Explained


Terminologies and its
types


Data access from non-volatile storage such as hard disk is a slow process.
The data is required to be fetched from hard disk or secondary storage.
One way combat this challenge is to pre-processor and store data.
Example:
cubes, aggregate tables, query sets, etc…
The initial process of pre-computing and storing data or fetching it from secondary storage.
The problem has been addressed using in-memory analytics
All the relevant data is stored in Random Access Memory(RAM)or primary storage
In-Memory Analytics


Faster access
Rapid deployment
Better insights
Minimal IT involvements
Advantages


In-database processing is also called as in-database analytics
The data from various enterprise Online Transaction processing(OLAP) system after
cleaning up
example:
de-duplication, scrubbing, etc…
The huge datasets are then exported to analytical programs for complex and extensive
computations.
Leading database vendors are offering this feature to large business
In-Database processing


SMP is a single common main memory that is shared by two or more identical processors.
To all I/O devices and are controlled by a single operating system instance.
SMP are tightly coupled multiprocessor systems.
Its own high-speed memory, called cache memory and are connected using a system bus.
Symmetric Multiprocessor system(SMP)


Massively Parallel Processing(MPP) refers to the coordinated processing of the programs by
a number of processors working parallel.
Each have their own operating system and dedicated memory.
They works on different parts of the same program.
SMP works with the processors sharing the same operating system and same memory.
SMP is also referred to as tightly-coupled multiprocessing.
Massively Parallel Processing


Parallel system Distributed system
Memory Tightly coupled system
shared memory
Weakly coupled system
Distributed memory
Control Global clock control No global clock control

Processor Interconnection Order of Tbps Order of Gbps
Main focus Performance Scientific
Computing
Performance(cost and
scalability) Reliability/
availability
information/resource
sharing
Difference Between Parallel and Distributed Systems


Parallel System


Distributed System


The three most common type of architecture for multiprocessor high transaction rate
system.
They are:
Shared Memory(SM)
Shared Disk(SD)
Shared Nothing(SN)
Shared Nothing Architecture


A common central memory is shared by multiple processors
Shared Disk(SD)
Multiple processors share a common collection of disks while having their own private
memory
Shared Nothing(SN)
If neither memory nor disk is shared among multiple processors


Shared Memory(SM)


Fault Isolation:
Its provides the benefit of isolating fault.
A fault in a single node is contained and confined.
That node exclusive and exposed only through message.
Scalability:
The disk is a shared resource.
The controlled and the disk bandwidth are also shared state.
A different nodes will have to take turns to access the critical data.
A distributed shared disk system thus compromising on scalability.
Advantages of a “ shared nothing Architecture”


The CAP theorem is also called the Brewer’s theorem.
A collection of interconnection nodes that share data.
The three types of CAP theorem they are:
Consistency
Availability
Partition tolerance


CAP Theorem Explained


Consistency
A implies that every read fetches the last write.
Availability
A implies that reads and writes always succeed.
Each non-failing node will return a response in a reasonable amount
of time.
Partition tolerance
It implies that the system will continue to function when network partition occurs.

CAP Theorem


THANK YOU