MongoDB Sharding

RobWalters6 468 views 42 slides Mar 21, 2017

Slide 1 of 42

About This Presentation

MongoDB Sharding deck used at the Hartford MongoDB Users Group meeting

Size: 2.29 MB

Language: en

Added: Mar 21, 2017

Slides: 42 pages

Slide Content

Sharding Robert Walters Robert.walters@ mongodb.com Solutions Architect

Agenda When to Scale How to Scale Sharding Concepts MongoDB Sharding Shard Keys Hashed Shard Keys

Contrast with Replication In an earlier module, we discussed Replication. This should never be confused with sharding . Replication is about high availability and durability. Taking your data and constantly copying it Being ready to have another machine step in to field requests.

Sharding is Concerned with Scale What happens when a system is unable to handle the application load? It is time to consider scaling. There are 2 types of scaling we want to consider: Vertical scaling Horizontal scaling

How to Scale

MongoDB – Scale Out vs. Scale Up Vs. Price Scale Price Scale

When to Scale

Disk I/O Limitations

Working Set Exceeds Physical Memory Working Set Indexes Data Working Set Indexes Data

Sharding Concepts

Shards and Shard Keys Shard Shard key range

Range-based Sharding Bagpipes Iceberg Snow Cone A - C D - O P - Z Shard Shard key range Shard key

Possible Imbalance? Depending on how you configure sharding , data can become unbalanced on your sharded cluster. Some shards might receive more inserts than others. Some shards might have documents that grow more than those in other shards. This may result in too much load on a single shard. Reads and writes Disk activity This would defeat the purpose of sharding .

Balancing Dates Dragons A - C D - O P - Z

Balancing Shards MongoDB divides data into chunks. This is bookkeeping metadata. There is nothing in a document that indicates its chunk. The document does not need to be updated if its assigned chunk changes. If a chunk grows too large MongoDB will split it into two chunks. The MongoDB balancer keeps chunks distributed across shards in equal numbers. However, a balanced sharded cluster depends on a good shard key.

Balancing A - De Di - O P - Z Background process balances data across shards

Mongos A mongos is responsible for accepting requests and returning results to an application driver. In a sharded cluster, nearly all operations go through a mongos. A sharded cluster can have as many mongos routers as required. It is typical for each application server to have one mongos. Always use more than one mongos to avoid a single point of failure.

Routed Query A - De Di - O P - Z Query Router query

Scatter Gather Query A - De Di - O P - Z Query Router query

MongoDB Sharding

System Components Shard (mongod) – manage data Query Router (mongos) – routes queries Config Server (mongod -- configsvr ) – manages metadata about data located on shards

Sharded Cluster Redundancy Redundancy (replica set)

Config Server Hardware Requirements Quality network interfaces A small amount of disk space (typically a few GB) A small amount of RAM (typically a few GB) The larger the sharded cluster, the greater the config server hardware requirements.

Balancing - considerations Balancer moves 64MB per round 64 1MB docs  640,000 100 byte docs For each round Manage metadata on range : : shard relationship For each doc: Read from source shard Write to destination shard Delete from source shard

Shard Keys

What is a Shard Key Shard key is used to partition your collection Shard key must exist in every document Shard key is immutable Shard key values are immutable Shard key must be indexed Shard key is used to route requests to shards

With a Good Shard Key You might easily see that: Reads hit only 1 or 2 shards per query. Writes are distributed across all servers. Your disk usage is evenly distributed across shards. Things stay this way as you scale.

With a Bad Shard Key You might see that : Your reads hit every shard. Your writes are concentrated on one shard. Most of your data is on just a few shards. Adding more shards to the cluster will not help.

Picking a shard key That has high cardinality That is used in the majority of read queries For which the values read and write operations use are randomly distributed For which the majority or reads are routed to a particular server

More Specifically Your shard key should be consistent with your query patterns. If reads usually find only one document, you only need good cardinality. If reads retrieve many documents: Your shard key supports locality Matching documents will reside on the same shard.

Cardinality A good shard key will have high cardinality. A relatively small number of documents should have the same shard key. Otherwise operations become isolated to the same server. Because documents with the same shard key reside on the same shard. Adding more servers will not help. Hashing will not help

Non-Monotonic A good shard key will generate new values non-monotonically. Datetimes , counters, and ObjectIds make bad shard keys. Monotonic shard keys cause all inserts to happen on the same shard. Hashing will solve this problem. However, doing range queries with a hashed shard key will perform a scatter-gather query across the cluster.

Hashed Shard Keys

Hashed Shard Keys Bagpipes 51b0af600acd1be72b4583a91cae0024 MD5

Hash-based Sharding Bagpipes ( C4ca4 . . ) Iceberg (Q46eb . . ) Snow Cones (J3adc . . ) A - C D - O P - Z

Under the hood Create a hashed index used for sharding Uses the first 64-bits of md5 hash of field Still need to thoughtfully choose key Uses existing hash index, or creates a new one on a collection

Using hashed indexes Create index: db.collection.createIndex ( {field : “hashed”} ) Enable sharding on collection: sh.ShardCollection (“ test.collection ”, {field: “hashed ”} ) Good for Equality query routed to a specific shard Will make use of hashed index Most efficient query possible

Using hashed indexes Not good for Range queries will be scatter / gather Cannot utilize a hashed index Supplement, ordered index may be used at shard level Inefficient query pattern

Other Limitations Cannot be used in compound or unique indexes No support for multi-key indexes Incompatible with tag aware sharding Tags would be assigned hashed values, not the original key Will not overcome keys with poor cardinality Floating point numbers are truncated before hashing

Summary

Resources Tutorial: Sharding (laptop friendly) Tutorial: Converting a Replica Set to a Replicated Sharded Cluster (laptop friendly) Manual: Select a Shard Key Manual: Hash-based Sharding Manual: Tag Aware Sharding Manual: Strategies for Bulk Inserts in Sharded Clusters Manual: Manage Sharded Cluster Balancer

MongoDB Sharding

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

MongoDB Sharding

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......