ai based computer basic learning Lecture about Bigdata.ppt

ALAMGIRHOSSAIN256982 18 views 21 slides May 31, 2024
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

ai based computer basic learning Lecture about Bigdata


Slide Content

Introduction to Big Data
Muhammad AsimKhan

Topics
Scope: Big Data & Analytics
Topics:
Foundation of Data Analytics and Data Mining
Hadoop/Map-Reduce Programming and Data Processing &
BigTable/Hbase/Cassandra
Graph Database and Graph Analytics
2

What’s Big Data?
Nosingledefinition;hereisfromWikipedia:
Bigdataisthetermforacollectionofdatasetssolargeandcomplexthatit
becomesdifficulttoprocessusingon-handdatabasemanagementtoolsor
traditionaldataprocessingapplications.
Thechallengesincludecapture,curation,storage,search,sharing,transfer,
analysis,andvisualization.
Thetrendtolargerdatasetsisduetotheadditionalinformationderivable
fromanalysisofasinglelargesetofrelateddata,ascomparedtoseparate
smallersetswiththesametotalamountofdata,allowingcorrelationstobe
foundto"spotbusinesstrends,determinequalityofresearch,prevent
diseases,linklegalcitations,combatcrime,anddeterminereal-timeroadway
trafficconditions.”
3

Big Data: 3V’s4

Volume (Scale)
Data Volume
44x increase from 2009 2020
From 0.8 zettabytesto 35zb
Data volume is increasing exponentially
5
Exponential increase in
collected/generated data

12+ TBs
of tweet data
every day
25+ TBs of
log data
every day
? TBs
of
data every day
2+
billion
people on
the Web
by end
2011
30 billionRFID
tags today
(1.3B in 2005)
4.6
billion
camera
phones
world wide
100s of
millions
of GPS
enabled
devices sold
annually
76 millionsmart meters
in 2009…
200M by 2014

Maximilien Brice, © CERN
CERN’s Large Hydron Collider (LHC) generates 15 PB a year

The Earthscope
TheEarthscopeistheworld'slargestscience
project.DesignedtotrackNorthAmerica's
geologicalevolution,thisobservatoryrecords
dataover3.8millionsquaremiles,amassing67
terabytesofdata.Itanalyzesseismicslipsinthe
SanAndreasfault,sure,butalsotheplumeof
magmaunderneathYellowstoneandmuch,much
more.
(http://www.msnbc.msn.com/id/44363598/ns/tec
hnology_and_science-
future_of_technology/#.TmetOdQ--uI)
1.

Variety (Complexity)
RelationalData(Tables/Transaction/LegacyData)
TextData(Web)
Semi-structuredData(XML)
GraphData
SocialNetwork,SemanticWeb(RDF),…
StreamingData
Youcanonlyscanthedataonce
Asingleapplicationcanbegenerating/collectingmany
typesofdata
BigPublicData(online,weather,finance,etc)
9
To extract knowledgeall these types of
data need to linked together

A Single View to the Customer
Customer
Social
Media
Gamin
g
Entertain
Bankin
g
Financ
e
Our
Known
History
Purchas
e

Velocity (Speed)
Dataisbegingeneratedfastandneedtobeprocessedfast
OnlineDataAnalytics
Latedecisionsmissingopportunities
Examples
E-Promotions:Basedonyourcurrentlocation,yourpurchase
history,whatyoulikesendpromotionsrightnowforstorenext
toyou
Healthcaremonitoring:sensorsmonitoringyouractivitiesand
bodyanyabnormalmeasurementsrequireimmediatereaction
11

Real-time/Fast Data
Theprogressandinnovationisnolongerhinderedbytheabilitytocollectdata
But,bytheabilitytomanage,analyze,summarize,visualize,anddiscoverknowledgefrom
thecollecteddatainatimelymannerandinascalablefashion
12
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)

Real-Time Analytics/Decision Requirement
Customer
Influence
Behavior
Product
Recommendations
that are Relevant
& Compelling
Friend Invitations
to join a
Game or Activity
that expands
business
Preventing Fraud
as it is Occurring
& preventing more
proactively
Learning why Customers
Switch to competitors
and their offers; in
time to Counter
Improving the
Marketing
Effectiveness of a
Promotion while it
is still in Play

Some Make it 4V’s14

Harnessing Big Data
OLTP: Online Transaction Processing (DBMSs)
OLAP: Online Analytical Processing (Data Warehousing)
RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)
15

The Model Has Changed…
The Model of Generating/Consuming Data has Changed
16
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming data

What’s driving Big Data17
-Ad-hoc querying and reporting
-Data mining techniques
-Structured data, typical sources
-Small to mid-size datasets
-Optimizations and predictive analytics
-Complex statistical analysis
-All types of data, and many sources
-Very large datasets
-More of a real-time

Big Data:
Batch Processing &
Distributed Data Store
Hadoop/Spark;
HBase/Cassandra
BI Reporting
OLAP &
Datawarehouse
Business Objects, SAS,
Informatica, Cognosother
SQL Reporting Tools
Interactive
Business
Intelligence &
In-memory RDBMS
QliqView, Tableau, HANA
Big Data:
Real Time &
Single View
Graph Databases
The Evolution of Business Intelligence
1990’s 2000’s 2010’s
Speed
Scale
Scale
Speed

Big Data Analytics
Bigdataismorereal-timein
naturethantraditionalDW
applications
TraditionalDWarchitectures(e.g.
Exadata,Teradata)arenotwell-
suitedforbigdataapps
Sharednothing,massivelyparallel
processing,scaleoutarchitectures
arewell-suitedforbigdataapps
19

Big Data Technology21