Big Data Analytics Orientation. .pdf

080msdsa024yatru 31 views 37 slides May 05, 2024
Slide 1
Slide 1 of 37
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37

About This Presentation

BDA


Slide Content

Big Data Analytics 1
Big Data Analytics
Orientation
Basanta Joshi, PhD
Asst. Prof., Depart of Electronics and Computer Engineering
Deputy Director, Center for Applied Research and Development
Member, Laboratory for ICT Research and Development (LICT)
Institute of Engineering
[email protected]
http://www.basantajoshi.com.np
https://scholar.google.com/citations?user=iocLiGcAAAAJ
https://www.researchgate.net/profile/Basanta_Joshi2

Big Data Analytics 2
About me
Current Affiliation
•Assistant Professor, Department of Electronics and
Computer Engineering, Pulchowk Campus
•https://pcampus.edu.np/
•Deputy Director, Center for Applied Research &
Development(CARD) https://card.ioe.edu.np/
•Member, Laboratory for ICT Research and Development
(LICT) http://lict.ioe.edu.np/
Education
•Bachelor of Electronics and Communication
Engineering, IOE 2005
•MSc in Information and Communication Engineering,
IOE, 2008
•Doctor of Engineering, Osaka Sangyo University, Japan
2013
Industry Experience
•Senior Software Engineer, D2hawkeye, Nepal
•Research Consultant, LogPoint, Nepal
•System Administration and Web development, Japan
•IT Consultant In Various National & International
Projects
Interests and Research Area
•3D Reconstruction/ Motion Tracking
•Nepali Language Processing
•Video Analytics
•Network Analytics
•Medical Data Analytics
•Edge Analytics
•Use of Big data Analytics in above mentioned Areas

Big Data Analytics 3
Fill a survey form
MSDSA Students BDA Status Survey Form
https://forms.gle/ripvzKHwq8k8NqnA8

Big Data Analytics 4
Big data world then

Big Data Analytics 5
Big data world then

Big Data Analytics 6
Big data world then
The ‘Data Lake’ of Antiquity
www.extentia.com

Big Data Analytics 7
Big data world then
Big Data Analytics by VikramNeerugatti

Big Data Analytics 8
Big Data Statistics 2020
•Everypersonwillgenerate1.7megabytesinjustasecond.Thetimesusersspendonsocialisabout65minutes
•Facebookhasgainedaround2.7billionactivemonthlyusersandgenerates4petabytesofdataperday
•Facebookstatedthat3.3billionpeoplewereusingatleastoneofthecompany'scoreproducts(Facebook,WhatsApp,
Instagram,orMessenger)eachmonth.
•YouTubecurrentlycounts2billionmonthlyactiveusersand500hoursofcontentareuploadedtotheplatformevery
minutes
•In2019,Amazonhas150millionmobileusers
•Twitteruserssendmorethan528,780tweetseveryminute.
•Over2.5quintillionbytesofdataisgeneratedworldwideeveryday.
•Thetotalamountofdatacreated,captured,copied,andconsumedgloballyhasreached64.2zettabytesin2020andis
forecasttoincreaserapidlyovernextfiveprojectedtogrowtomorethan180zettabytesby2025.
•By2021,insight-drivenbusinessesarepredictedtotake$1.8trillionannuallyfromtheirless-informedpeers.
•Data-drivenorganizationsare23timesmorelikelytoacquirecustomersthantheirpeers.
•Businessesarespending$187billiononbigdataandanalyticsin2019.
•91.6%offirmsworldwideconfirmanincreasedpaceininvestmentinbigdatain2019.

Big Data Analytics 9
Big Data Statistics 2021
•In2020,Everypersongenerated1.7megabytesinjustasecond.Thetimesusersspendonsocialisabout145minutes
•Facebookhasgainedaround2.9billionactivemonthlyusersandgenerates4petabytesofdataperday
•Facebookstatedthat3.58billionpeoplewereusingatleastoneofthecompany'scoreproducts(Facebook,WhatsApp,
Instagram,orMessenger)eachmonth.
•InAugust2020,YouTubecurrentlycounts2billionmonthlyactiveusersand500hoursofcontentareuploadedtothe
platformeveryminutes
•In2020,Amazonhas150millionmobileusers
•Twitteruserssendmorethan529,020tweetseveryminute.
•Over2.5quintillionbytesofdataisgeneratedworldwideeveryday.
•Thetotalamountofdatacreated,captured,copied,andconsumedgloballyhasreached64.2zettabytesin
2020andisforecasttoincreaserapidlyovernextfiveprojectedtogrowtomorethan180zettabytesby2025.
•By2021,insight-drivenbusinessesarepredictedtotake$1.8trillionannuallyfromtheirless-informedpeers.
•Using big data, Netflix saves $1 billion per year on customer retention.
•Worldwidespendingonbigdataandbusinessanalytics(BDA)solutionsisforecasttoreach$215.7billionthisyear,an
increaseof10.1%over2020
•97.2%oforganizationsareinvestinginbigdataandAI.
https://techjury.net/blog/big-data-statistics/#grefhttps://www.statista.com/topics/1464/big-data/
December,2021

Big Data Analytics 10
Big Data Statistics 2022/23
1. Each day, Google processes 8.5 billion searches. (Source: Oberlo)
2. WhatsApp users exchange up to 65 billion messages daily. (Source: ConnectivaSystems)
3. 95% of businesses cite the need to manage unstructured data as a problem for their business.
(Source: Statista)
4. 45% of businesses worldwide are running at least one of their Big Data workloads in the
cloud.(Source: ZD Net)
5. 80-90% of the data we generate today is unstructured.(Source: CIO)
6. The market of Big Data analytics in banking is set to reach $62.10 billion by 2025.(Source: KR Elixir)
7. Big data in healthcare could be worth $71.6 billion by 2027.(Source: Globe News Wire)
8. According to big data stats, cyber scams have gone up 400% at the start of the pandemic. (Source:
Reed Smith)
9. Data creation will grow to more than 180 zettabytes by 2025.(Source: Statista)
10. Today it would take a person approximately 181 million years to download all the data from the
internet.(Source: Unicorn Insights)

Big Data Analytics 11
Big Data Statistics 2022/23
11. The demand for composite data analytics professionals will grow by 31% by 2030.(Source: Forbes)
12. Internet users spent a total of 1.2 billion years online.(Source: Digital)
13. Social media accounts for 33% of the total time spent online.(Source: Global Web Index)
14. Facebook has almost two billion daily active users.(Source: Datareportal)
15. Tweeps send over 870 million tweets per day.(Source: Internet Live Stats)
16. 97.2% of organizations are investing in big data and AI.(Source: New Vantage)
17. Big data will grow at a 12% CAGR by 2026.(Source: Market Data Forecast)
18. The software sector will bring in the highest revenue by 2027.(Source: Statista)
19. The number of IoTdevices could rise to 41.6 billion by 2025.(Source: IDC)
20. Worldwide spending on Big Data analytics solutions will be worth over $274.3 billion in
2022.(Source: Business Wire)
21. The ratio between unique and replicated data will be 1:10 by 2024.(Source: IDC)
22. Data science jobs will increase by around 28% by 2026.(Source: Towards Data Science)

Big Data Analytics 12
https://financesonline.com/big-data-statistics/
December,2021
Big Data Statistics 2021

Big Data Analytics 13
What is Big Data ?
•Big Data is the next generation of data warehousing.
Nosingledefinition;hereisfromWikipedia:
Bigdataisthetermforacollectionofdatasetssolarge
andcomplexthatitbecomesdifficulttoprocessusingon-
handdatabasemanagementtoolsortraditionaldata
processingapplications.
•Thechallengesincludecapture,curation,storage,
search,sharing,transfer,analysis,andvisualization.
Thetrendtolargerdatasetsisduetotheadditional
informationderivablefromanalysisofasinglelargesetof
relateddata,ascomparedtoseparatesmallersetswith
thesametotalamountofdata,allowingcorrelationstobe
foundto"spotbusinesstrends,determinequalityof
research,preventdiseases,linklegalcitations,combat
crime,anddeterminereal-timeroadwaytraffic
conditions.”
2

Big Data Analytics 14
Describing Data Size
https://twitter.com/paolopisani/

Big Data Analytics 15
Big Data challenges

Big Data Analytics 16
Applications

Big Data Analytics 17
Technologies for Big data

Big Data Analytics 18
Big Data Analytics

Big Data Analytics 19
Big Data Analytics Use
Cases

Big Data Analytics 21
Big data Landscape

Big Data Analytics 22
Big data Landscape

Big Data Analytics 23
Big data Landscape

Big Data Analytics 24
Big data Landscape

Big Data Analytics 25

Big Data Analytics 26

Big Data Analytics 27
Lamda Architecture

Big Data Analytics 28
Hadoop Ecosystem

Big Data Analytics 29
Stream Processing
Architecture

Big Data Analytics 30
Course Agenda

Big Data Analytics 31
Course Objectives
•TogiveoverviewofBigdataandlatestTrendinBigDataAnalytics
•TointroducethetechnologiesforHandlingBigData
•TointroduceHadoopandComponentsinHadoopPlatform
•Toperformbasicexplorationoflarge,complexdatasetsandunderstand
scalablebigdataanalysis
•Toapplybigdatatoolforadvancedanalyticsdisciplinessuchaspredictive
analytics,datamining,textanalyticsandstatisticalanalysis.

Big Data Analytics 32
Syllabus
•Fundamentals of Big Data Analytics (6 hours)
•Big Data and the V’s of Big Data, Handling and Processing Big Data, The Big Data landscape, Big
Data Analytics, Examples of real world big data problems

•Technologies for Handling Big Data(8 hours)
•GFS, HDFS, Google Big Table, Introduction to Hadoop, functioning of Hadoop, MapReduce,
RDDs Cloud Computing for big data

•Understanding Big Data Technology Foundations( 12 hours)
•Big data stack i.e. data source layer, ingestion layer, storage layer, processing layer, security layer,
visualization layer, visualization approaches etc.
•Architectural design patterns and programming models used for Real-World Applications, Lamda
Architecture e.t.c

Big Data Analytics 33
Syllabus
•Understanding Big data Ecosystem ( 12hours)
•Hadoopand its ecosystem, Introduction and Experimentation with Apache Flume, Apache Kafka, Apache
Zookeeper, , Apache Spark, Apache Mesos, Apache Kudu etc
•Amazon and Google Cloud Platform, Microsoft Azure , Amazon Kinesis.t.c

•Using Big Data for Analytics( 10 hours)
•Basic approaches to querying and exploring big data ,
•New Databases for Big data Analytics -Classification, Characteristics and Comparison: Apache HBase, Apache
Hive, Apache Cassandra e.t.c
•Descriptive, Diagnostic, Predictive, Prescriptive Analytics ,Stream Analytics and Location Analytics
•Case studies for big data analytics

•Machine Learning with Big Data ( 12 hours)
•Introduction to parallel, distributed and scalable machine learning.
•Using processing layer tools (Apache Spark, Apache Mahout) to train, evaluate, and validate basic predictive
models.
•Case studies for application of machine learning in big data.

Big Data Analytics 34
Teaching Methodology
•Lecture notes with available on google classroom
CLASSROOM CODE : rnsrtvb
•Students should submit assignments including programming assignments in google
classroom
•Students should do presentation upload in google classroom

Big Data Analytics 35
Marking Scheme
S.N.Course CodeCourse Title CreditInternalExternalTotal
BigData Analytics 4 40 60 100
•Students will be evaluated and Internal Marks will be given
by course teacher
•Marks distribution
–Attendance 20% => 8
–Student Performance (Assignments+ Presentation) 20% => 8
–Projects 20% => 8
–Quiz + Assessment 40% => 16
•Students have to appear in Final exam to obtain External
Marks

Big Data Analytics 36
Teaching Methodology
•Lecture notes with available on google classroom
Students have been already invited to google CLASSROOM
•Students should submit assignments including a projectin google classroom
•Students should do presentation upload in google classroom
•Presentation/ Project will be done by in a group of two

Big Data Analytics 37
References
•Tom White. Hadoop: The Definitive Guide, Storage and Analysis at Internet Scale, O'Reilly
Media, Fourth Edition,2015
•Nathan Marzand James Warren. Big Data: Principles and best practices of scalable realtimedata
systems, Manning Publications,FirstEdition,2015
•Mark Grover, Ted Malaska, Jonathan Seidman, Gwen Shapira. HadoopApplication Architectures:
Designing Real-World Big Data Applications, O'Reilly Media,FirstEdition, 2015
•Holden Karau, Andy Konwinski, Patrick Wendell, MateiZaharia. Learning Spark: Lightning -Fast
Big Data Analysis,FirstEdition,O'ReillyMedia, 2015:
•NatarajDasgupta, Practical Big Data Analytics: Hands-on techniques to implement enterprise
analytics and machine learning using Hadoop, Spark, NoSQLand R, PacktPublishing,2018
•http://index-of.co.uk/Big-Data-Technologies/

Big Data Analytics 38
Thank you !!!
Tags