MongoDB.pdf

44 views 61 slides Oct 04, 2023
Slide 1
Slide 1 of 61
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61

About This Presentation

mongodb fundamentals


Slide Content

( mongoDB

Technical Overview

Learnizo Global LLP

Agenda

Welcome and Introductions
The World We Live In
MongoDB Technical Overview
Use Case Discussion

Demo

MongoDB - It's About the Data
THE WORLD’S INFORMATION IS DOUBLING

EVERY TWO YEARS, WITH A COLOSSAL

zettabytes
Det
o &replcated ln
m AT man 1PADS we coo
Inneren HS 7415} Seesen ZUG MULLINS elsa Freer
THAN MT. FUJI

3 TWEETS PER MINUTE or 200 BILLION HD MOVIES
Sie oh Ske oS each 12.0
» à à @ MINUTES LONG
41320 rweers re av ee person 00m

S2%%5 322355 00000
EERE 00000

57:5 BILLION
32GB APPLE IPADS

VIII 200000 yy
LAA 00000 & Y v AAA
PA má wm ALAS AMA
AAA WOULD TAKE ONE PERSON en o
47 MILLION vers
ror 26,976 visas now stor een EST O EN,

Mashable Infographics

Design by Qsasro em
‘Research by Jesh Catone 3

MongoDB, It’s for the Developers

cose a 10gen Dre
10gen Tech’ $

Ease of Use and Suitability Rank Highly
Over 40% of users see a 50% or greater
increase in developer productivity

Y ml =
man una —
| h an
| h . A
= à =
«E

MongoDB Brings It All Together

Agile Development

* Iterative
* Continuous

Volume of Data

* Trillions of records

+ 100’s of millions of
queries per second

ardware Architectures
Cloud Computing
* Commodity servers

MongoDB Use Cases

SAP VIACOM. BUDDY#:MEDIA

shutterly Pi

| (2
Éd Dfsnep

craigslist

werdnik

En) OL LOU O O
ZA u De loggly

10gen: The Creators of MongoDB
+ Founded in 2007
— Dwight Merriman, Eliot
Horowitz
— Doubleclick, Oracle,
Marklogic, HP
« $31M+ in funding

— Flybridge, Sequoia, Union
Square
+ Worldwide Expanding
Team
— 150+ employees
— NY, CAand UK

Agenda

+ MongoDB Technical Overview
« Use Case Discussion
+ Demo

Why was MongoDB built?

?

Key-value

te Neo4

Graph database

Document-oriented

Column family

NoSQL

ER DB

(

assandra

DB
CouchDB
1 ©):
®
HBASE

Traditional Architecture

* Relational
— Hard to map to the way we code
* Complex ORM frameworks
— Hard to evolve quickly
* Rigid schema is hard to change, necessitates migrations
— Hard to scale horizontally

« Joins, transactions make scaling by adding servers hard

RDBMS Limitations

Project

stare Denormalize

data model

Custom
caching layer

Custom
sharding

+30 Days

Larmeh

+6 months

+90 Days

MongoDB

Built from the start to solve the
scaling problem

Consistency, Availability, Partitioning
- (can’t have it all)

Configurable to fit requirements

Theory of noSQL: CAP

Many nodes

Nodes contain replicas of
partitions of data

Consistency

= all replicas contain the same
version of data

Availability

— system remains operational on
failing nodes

Partition tolarence
— multiple entry points

— system remains operational on
system split

CAP Theorem:
satisfying all three at the
same time is impossible

Supported languages

¿gi e».

python

A En
¿Al $ #

=p)
Scala —

http://www.mongodb.org/display/DOCS/Drivers

« Basically

+ Atomicity « Available (CP)
e Consistency e Soft-state

e Isolation

« Eventually
¢ Durability

consistent (AP)

MongoDB is easy to use

AST_INSERT_|
NULL,

COMMIT;

Scalability & Performance

As simple as possible,
but no simpler

Depth of functionality

Representing & Querying Data

Schema design
+ RDBMS: join

MySQL

rot |

20

Schema design

+ MongoDB: embed and link

e Embedding is the nesting of objects and arrays inside
a BSON document(prejoined). Links are references
between documents(client-side follow-up query).

e "contains" relationships, one to many; duplication of

data, many to many

21

Schema design

MongoDB

posts

title

slug
body
published
created
updated

author
email
body
created

: ‘MongoDB’,

: ‘Eliot Horowitz’,

: ‘[email protected]’ y,

: Dwight Merriman’,

: ‘[email protected]’ }

: false,
: true

23

MongoDB
Collection

JSON Document

Embedding & Linking

gf ale
EE 9
E =
— un

10gen | mongo 24

Documents

Collections contain documents

Documents can contain other documents
and/or
Documents can reference other documents

Flexible/powerful ability to relate data

Flexible Schema

25

CRUD

— db.collection.insert( <document> )

Create

— db.collection.save( <document> )

— db.collection.update( <query>, <update>, { upsert: true } )
Read

— db.collection.find( <query>, <projection> )

— db.collection.findOne( <query>, <projection> )

Update

— db.collection.update( <query>, <update>, <options> )
Delete

— db.collection.remove( <query>, <justOne> )

on

Documents

var p ={ author: “roger”,
date: new Date(),
title: “Spirited Away”,
avgRating: 9.834,
tags: [“Tezuka”, “Manga” ]}

> db.posts.save(p)

“lection

21

Linked vs Embedded Documents

id:

io
Objectld("4c4ba5c0672c685e5e8aabf3"),

author : "roger",
date : "Sat Jul 24 2010 ...”,
text : "Spirited Away",
tags : [ "Tezuka", "Manga" ],
comments : [
{
author : "Fred",
date : "Sat Jul 26 2010...”,
text : "Best Movie Ever"

1
avgRating: 9.834 }

{ _id : Objectld("4c4ba5c0672c685e5e8aabf3"),
author : "roger",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PD'
text : "Spirited Away",
tags : [ "Tezuka", "Manga" ],
comments : [ 6, 274, 1135, 1298, 2245, 5623],
avg_rating: 9.834 }
comments { _id:274,
movie_id : Objectid(“4c4ba5c0672cé
author : "Fred",
date : "Sat Jul 24 2010 20:51:(
text : "Best Movie Ever”)
{ _id:275,
movie_id : Objectld(“3d5ffc88
author : "Fred",

28

Querying
>db.posts.find()

{ _id : Objectld("4c4ba5c0672c685e5e8aabf3"),
author : "roger",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",
text : "Spirited Away",
tags :[ "Tezuka", "Manga" ] }

Note:
- id is unique, but can be anything you'd like

29

Query Operators

+ Conditional Operators
— Sall, Sexists, Smod, Sne, Sin, Snin, Snor, Sor, Ssize, Stype
— Slt, Site, Sgt, Sgte

// find posts with any tags
> db.posts.find( {tags: {Sexists: true }} )

// find posts matching a regular expression
> db.posts.find( {author: /Arog*/i } )

// count posts by author
> db.posts.find( {author: ‘roger’} ).count()

30

Atomic Operations

+ Sset, Sunset, Sinc, Spush, SpushAll, Spull, SpullAll,
Sbit

> comment = { author: “fred”,
date: new Date(),
text: “Best Movie Ever”)

> db.posts.update( {_id: *...” },
Spush: (comments: comment) );

Arrays

° Spush - append
° SpushAll — append array

+ SaddToSet and Seach — add if not contained,
add list

° Spop - remove last
° Spull — remove all occurrences/criteria
° {Spull : { field : (Sgt: 3} }}

° SpullAll - removes all occurrences of each
value

Indexes

Create index on any Field in Document
>db.posts.ensurelndex({author: 1})

// Index nested documents
> db.posts.ensurelndex( “comments.author”:1)
> db.posts.find({‘comments.author’:’Fred’})

// Index on tags (array values)
> db.posts.ensurelndex( tags: 1)
> db.posts.find( { tags: Manga’ })

// geospatial index
> db.posts.ensurelndex({ “author.location”: “2d” )
> db.posts.find( “author.location” : { Snear : [22,42] })

33

Aggregation/Batch Data Processing

+ Map/Reduce can be used for batch data processing
— Currently being used for totaling, averaging, etc
— Map/Reduce is a big hammer

¢ Simple aggregate functions available

« (2.2) Aggregation Framework: Simple, Fast
— No Javascript Needed, runs natively on server

— Filter or Select Only Matching Sub-documents or
Arrays via new operators

+ MongoDB Hadoop Connector
— Useful for Hadoop Integration
— Massive Batch Processing Jobs

34

Deployment & Scaling

Why Replicate?

Data Redundancy

Automatic Failover / High Availability
Distribution of read load

Disaster recovery

Replica Sets

Replica Sets

One primary, many secondaries

— Automatic replication to all secondaries
* Different delays may be configured

— Automatic election of new primary on failure

— Writes to primaries, reads can go to secondaries
Priority of secondary can be set

— Hidden for administration/back-ups

— Lower score for less powerful machines

Election of new primary is automatic

— Majority of replica set must be available

— Arbiters can be used

Many configurations possible (based on use case)

Replica Sets

Asynchronous
Replication

Replica Sets

Replica Sets

Write
2 ——>

Automatic
Leader Election

Replica Sets

Sharding

Sharding

Key Rakgy Range Key Range
0..30 0..100 31..100

Sharding

Key Range Key Range
0..30

31..100

(usb Dai "5

Sharding

Key Range
61390

COR

Sharding Administration

Splitting data into chunks

— Automatic

— Existing data can be manually “pre-split”
Migration of chunks/balancing between servers

— Automatic

— Can be turned off/chunks can be manually moved
Shard key

— Must be selected by you

— Very important for performance!

Each shard is really a replica set

47

Full Deployment

Queries

Key Range Key Range

49

SaaS solution providing instrumentation and visibility
into your MongoDB systems

Now Introducing

MMS

MongoDB Monitoring Service

MongoDB Monitoring Service is a cloud-based monitoring and alerting
solution for all MongoDB deployments. Get Started with MMS

3,500+ customers signed up and using service
50

Agenda

Welcome and Introductions
MongoDB and the New Frontier
MongoDB Technical Overview
Use Case Discussion

Demo

Welcome and Introductions
MongoDB and the New Frontier
MongoDB Technical Overview
Use Case Discussion

Demo

Queries

Importing Data into Mongodb

— mongoimport --db test --collection restaurants --
file dataset.json

Exporting Data from MongoDB

— mongoexport -db test -collection newcolln -file
myexport.json

53

Queries

+ MongoDB query operation:

cursor modifier

« Query in SQL

54

Create/Insert Queries

° Db.collection.insert()
db.inventory.insert(
{ item: "ABC1",

details: { model: "14Q3", manufacturer: "XYZ
Company" },

stock: [ { size: "S", qty: 25 }, { size: "M", qty: 50 }],
category: "clothing" } )

Find Queries

db.collection.find()

db.inventory.find( () )

db.inventory.find( { type: "snacks" } )
db.inventory.find( { type: { Sin: [ 'food', 'snacks'
ibe

db.inventory.find( { type: 'food', price: { Sit:
9.95 }})

db.inventory.find( { Sor: [ { qty: { Sgt: 100 } }, {
price: { Sit: 9.95 }}] })

56

Update Queries

* To use operators on fields of a subdoc use $
db.inventory.update( { item: "MNO2" },

{ Sset: { category: "apparel", details: { model: "1403",
manufacturer: "XYZ Company" } },

ScurrentDate: { lastModified: true } },
false, true )

« False: Update by replacement

* True: update all matching documents

57

Queries

Aggregation
— SQL Query
— SELECT state, SUM(pop) AS totalPop FROM

zipcodes GROUP BY state HAVING totalPop >=
(10*1000*1000)

— MongoDB

— db.zipcodes.aggregate( | { Sgroup: { _id: "Sstate",

totalPop: { Ssum: "Spop" }} }, { Smatch: { totalPop:
{ Sgte: 10*1000*1000 } }}] )

58

Indexes

db.collection.find({field:’value’}).explain()
db.collection. ensurelndex(ftitle: 1 });
db.collection.droplndex("index name");
db.mycoll.ensurelndex({'address.coord': ‘2d’})

db.mycoll.find({"address.coord": { Snear: [70,
40], SminDistance: 0.05 }})

Indexed and Nonindexed
Search

> db.mycoll.find({"name" : "Tov Kosher Kitchen")).pretty().explain()

{
"cursor" : "BtreeCursor name_1","isMultiKey" : false, "n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,...
}
> db.mycoll.find({"cuisine" : "Jewish/Kosher"}).pretty().explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n":316,
"nscannedObjects” : 25359,
"nscanned" : 25359,...
}
>

60

Thank You
Tags