SlidePub
Home
Categories
Login
Register
Home
General
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_Computing
316 views
32 slides
May 24, 2021
Slide
1
of 32
Previous
Next
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
About This Presentation
Stream Computing Platforms
Size:
182.98 KB
Language:
en
Added:
May 24, 2021
Slides:
32 pages
Slide Content
Slide 1
CS8091 / Big Data Analytics
III Year / VI Semester
Slide 2
UNIT IV -STREAM MEMORY
IntroductiontoStreamsConcepts–StreamDataModeland
Architecture-StreamComputing,SamplingDatainaStream–
FilteringStreams–CountingDistinctElementsinaStream–
Estimatingmoments–CountingonenessinaWindow–Decaying
Window–RealtimeAnalyticsPlatform(RTAP)applications-Case
Studies-RealTimeSentimentAnalysis,StockMarketPredictions.
UsingGraphAnalyticsforBigData:GraphAnalytics.
Slide 3
Stream Computing
Ahighperformancecomputersystemthatanalyzes
multipledatastreamsfrommanysources.
Streamcomputingisusedtomeanpullinginstreamsof
data,processingthedataandstreamingitbackoutasa
singleflow.
Itusessoftwarealgorithmsthatanalyzesthedatainreal
timeasitstreamsintoincreaseandaccuracywhendealing
withdatahandlingandanalysis.
Slide 4
Stream Computing
Slide 5
Stream Computing
Streamcomputingdeliversreal-timeanalytic
processingonconstantlychangingdatain
motion.
Itallowstocaptureandanalyzealldatainall
thetime,justintime.
Slide 6
Stream Computing
Streamanalyzesdatabeforeyoustoreit.
Analyzedatathatisinmotion(Velocity)
Processanytypeofdata(Variety)
Streamsisdesignedtoscaletoprocessanysizeof
datafromTerabytestoZetabytesperday.
Slide 7
Stream Computing
Storeless
Analyzemore
Makebetterdecisions,faster
Slide 8
Stream Computing
DataStreamprocessingplatforms:
Manyoftheseareopensourcesolutions.
Theseplatformsfacilitatetheconstructionofreal-time
applications,inparticularmessage-orientedorevent-
drivenapplicationswhichsupportingressofmessages
oreventsataveryhighrate,transfertosubsequent
processing,andgenerationofalerts.
Slide 9
Stream Computing
DataStreamprocessingplatforms:
Theseplatformsaremostlyfocusedonsupporting
event-drivendataflowthroughnodesinadistributed
systemorwithinacloudinfrastructureplatform.
TheHadoopecosystemcoversafamilyofprojects
thatfallundertheumbrellaofinfrastructurefor
distributedcomputingandlargedataprocessing.
Slide 10
Stream Computing
DataStreamprocessingplatforms:
Hadoopincludesanumberofcomponents,andbelow
isthelistofcomponents:
MapReduce,adistributeddataprocessingmodeland
executionenvironmentthatrunsonlargeclustersof
commoditymachines.
HadoopDistributedFileSystem(HDFS),adistributedfile
systemthatrunsonlargeclustersofcommoditymachines
Slide 11
Stream Computing
DataStreamprocessingplatforms:
Hadoopincludesanumberofcomponents,andbelowis
thelistofcomponents:
ZooKeeper,adistributed,highlyavailablecoordinationservice,
providingprimitivessuchasdistributedlocksthatcanbeusedfor
buildingdistributedapplications.
Pig,adataflowlanguageandexecutionenvironmentforexploring
verylargedatasets.PigsrunsonHDFSandMapReduceclusters.
Hive,adistributeddatawarehouse.
Slide 12
Stream Computing
DataStreamprocessingplatforms:
Itisdevelopedtosupportprocessinglargesetsof
structured,unstructured,andsemi-structureddata,
butitwasdesignedasabatchprocessingsystem.
Slide 13
Stream Computing
DataStreamprocessingplatforms–SPARK:
ApacheSparkismorerecentframeworkthatcombinesan
enginefordistributingprogramsacrossclustersof
machineswithamodelforwritingprogramsontopofit.
Itisaimedataddressingtheneedsofthedatascientist
community,inparticularinsupportofRead-Evaluate-Print
Loop(REPL)approachforplayingwithdatainteractively.
Slide 14
Stream Computing
DataStreamprocessingplatforms–SPARK:
SparkmaintainsMapReduce’slinearscalabilityand
faulttolerance,butextendsitinthreeimportantways:
First,ratherthanrelyingonarigidmap-then-reduceformat,
itsenginecanexecuteamoregeneraldirectedacyclicgraph
(DAG)ofoperators.Thismeansthatinsituationswhere
MapReducemustwriteoutintermediateresultstothe
distributedfilesystem,Sparkcanpassthemdirectlytothe
nextstepinthepipeline.
Slide 15
Stream Computing
DataStreamprocessingplatforms–SPARK:
SparkmaintainsMapReduce’slinearscalabilityandfault
tolerance,butextendsitinthreeimportantways:
Second,itcomplementsthiscapabilitywitharichsetof
transformationsthatenableuserstoexpresscomputationmore
naturally.
Third,Sparksupportsin-memoryprocessingacrossaclusterof
machines,thusnotrelyingontheuseofstorageforrecording
intermediatedata,asinMapReduce.
Slide 16
Stream Computing
DataStreamprocessingplatforms–SPARK:
Sparksupportsintegrationwiththevarietyoftoolsinthe
Hadoopecosystem.
Itcanreadandwritedatainallofthedataformatssupportedby
MapReduce.
ItcanreadfromandwritetoNoSQLdatabaseslikeHBaseand
Cassandra.
Itiswellsuitedforreal-timeprocessingandanalysis,supporting
scalable,highthroughput,andfault-tolerantprocessingoflivedata
streams.
Slide 17
Stream Computing
DataStreamprocessingplatforms–SPARK:
SparkStreaminggeneratesadiscretizedstream
(DStream)asacontinuousstreamofdata.
Regardinginputstream,SparkStreamingreceiveslive
inputdatastreamsthroughareceiveranddividesdata
intomicrobatches,whicharethenprocessedbythe
Sparkenginetogeneratethefinalstreamofresultsin
batches.
Slide 18
Stream Computing
DataStreamprocessingplatforms–SPARK:
SparkStreamingutilizesasmall-interval(inseconds)
deterministicbatchtoseparatestreamintoprocessable
units.
Thesizeoftheintervaldictatesthroughputand
latency,sothelargertheinterval,thehigherthe
throughputandthelatency.
Slide 19
Stream Computing
DataStreamprocessingplatforms–SPARK:
SinceSparkcoreframeworkexploitsmain
memory(asopposedtoStorm,whichisusing
Zookeeper)itsminibatchprocessingcanappearas
fastas“oneatatimeprocessing”adoptedin
Storm,despiteofthefactthattheRDDunitsare
largerthanStormtuples.
Slide 20
Stream Computing
DataStreamprocessingplatforms–SPARK:
Thebenefitfromtheminibatchistoenhancethe
throughputininternalenginebyreducingdata
shippingoverhead,suchasloweroverheadforthe
ISO/OSItransportlayerheader,whichwillallowthe
threadstoconcentrateoncomputation.
SparkwaswritteninScala,butitcomeswithlibraries
andwrappersthatallowtheuseofRorPython.
Slide 21
Stream Computing
DataStreamprocessingplatforms–Storm:
Stormisadistributedreal-timecomputationsystem
forprocessinglargevolumesofhigh-velocitydata.
Itmakesiteasytoreliablyprocessunboundedstreams
ofdataandhasarelativelysimpleprocessingmodel
owingtotheuseofpowerfulabstractions.
Slide 22
Stream Computing
DataStreamprocessingplatforms–Storm:
Aspoutisasourceofstreamsinacomputation.
Typically,aspoutreadsfromaqueuingbroker,suchas
RabbitMQ,orKafka,butaspoutcanalsogenerateitsown
streamorreadfromsomewhereliketheTwitterstreaming
API.
Spoutimplementationsalreadyexistformostqueuing
systems.
Slide 23
Stream Computing
DataStreamprocessingplatforms–Storm:
Aboltprocessesanynumberofinputstreamsand
producesanynumberofnewoutputstreams.
Theyareevent-drivencomponents,andcannotbeusedto
readdata.Thisiswhatspoutsaredesignedfor.
Mostofthelogicofacomputationgoesintobolts,suchas
functions,filters,streamingjoins,streamingaggregations,
talkingtodatabases,andsoon.
Slide 24
Stream Computing
DataStreamprocessingplatforms–Storm:
AtopologyisaDAGofspoutsandbolts,witheach
edgeintheDAGrepresentingaboltsubscribingtothe
outputstreamofsomeotherspoutorbolt.
Atopologyisanarbitrarilycomplexmultistagestream
computation;topologiesrunindefinitelywhendeployed.
Slide 25
Stream Computing
DataStreamprocessingplatforms–Storm:
Tridentprovidesasetofhigh-levelabstractionsinStorm
thatweredevelopedtofacilitateprogrammingofreal-time
applicationsontopofStorminfrastructure.
Itsupportsjoins,aggregations,grouping,functions,and
filters.Inadditiontothese,Tridentaddsprimitivesfor
doingstatefulincrementalprocessingontopofany
databaseorpersistencestore
Slide 26
Stream Computing
DataStreamprocessingplatforms–KAFKA:
Kafkaisanopensourcemessagebrokerproject
developedbytheApacheSoftwareFoundationand
writteninScala.
Theprojectaimstoprovideaunified,high-
throughput,low-latencyplatformforhandlingreal-
timedatafeeds.
Slide 27
Stream Computing
DataStreamprocessingplatforms–KAFKA:
AsingleKafkabrokercanhandlehundredsof
megabytesofreadsandwritespersecondfrom
thousandsofclients.
Inordertosupporthighavailabilityandhorizontal
scalability,datastreamsarepartitionedandspread
overaclusterofmachines.
Slide 28
Stream Computing
DataStreamprocessingplatforms–KAFKA:
KafkadependsonZookeeperfromtheHadoop
ecosystemforcoordinationofprocessingnodes.
ThemainusesofKafkaareinsituationswhen
applicationsneedaveryhighthroughputformessage
processing,whilemeetinglowlatency,high
availability,andhighscalabilityrequirements.
Slide 29
Stream Computing
DataStreamprocessingplatforms–Flume:
Flumeisadistributed,reliable,andavailableservice
forefficientlycollecting,aggregating,andmoving
largeamountsoflogdata.
Itisrobustandfaulttolerantwithtunablereliability
mechanismsandmanyfailoverandrecovery
mechanisms.Itusesasimpleextensibledatamodel
thatallowsforonlineanalyticapplication.
Slide 30
Stream Computing
DataStreamprocessingplatforms–Flume:
WhileFlumeandKafkabothcanactastheevent
backboneforreal-timeeventprocessing,theyhave
differentcharacteristics.
Flumeisbettersuitedincaseswhenoneneedsto
supportdataingestionandsimpleeventprocessing.
Slide 31
Stream Computing
DataStreamprocessingplatforms–AmazonKinesis:
AmazonKinesisisacloud-basedserviceforreal-timedata
processingoverlarge,distributeddatastreams.
AmazonKinesiscancontinuouslycaptureandstore
terabytesofdataperhourfromhundredsofthousandsof
sourcessuchaswebsiteclickstreams,financial
transactions,socialmediafeeds,ITlogs,andlocation-
trackingevents.
Slide 32
Stream Computing
DataStreamprocessingplatforms–AmazonKinesis:
KinesisallowsintegrationwithStorm,asitprovidesa
KinesisStormSpoutthatfetchesdatafromaKinesis
streamandemitsitastuples.
TheinclusionofthisKinesiscomponentintoaStorm
topologyprovidesareliableandscalablestreamcapture,
storage,andreplayservice.
Tags
kncet
kncet it
kafka
flume
storm
Categories
General
Download
Download Slideshow
Get the original presentation file
Quick Actions
Embed
Share
Save
Print
Full
Report
Statistics
Views
316
Slides
32
Age
1653 days
Related Slideshows
22
Pray For The Peace Of Jerusalem and You Will Prosper
RodolfoMoralesMarcuc
32 views
26
Don_t_Waste_Your_Life_God.....powerpoint
chalobrido8
33 views
31
VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf
JaiJai148317
31 views
14
Fertility awareness methods for women in the society
Isaiah47
30 views
35
Chapter 5 Arithmetic Functions Computer Organisation and Architecture
RitikSharma297999
27 views
5
syakira bhasa inggris (1) (1).pptx.......
ourcommunity56
29 views
View More in This Category
Embed Slideshow
Dimensions
Width (px)
Height (px)
Start Page
Which slide to start from (1-32)
Options
Auto-play slides
Show controls
Embed Code
Copy Code
Share Slideshow
Share on Social Media
Share on Facebook
Share on Twitter
Share on LinkedIn
Share via Email
Or copy link
Copy
Report Content
Reason for reporting
*
Select a reason...
Inappropriate content
Copyright violation
Spam or misleading
Offensive or hateful
Privacy violation
Other
Slide number
Leave blank if it applies to the entire slideshow
Additional details
*
Help us understand the problem better