Boltdb - an embedded key value database

awmanoj 1,322 views 15 slides Nov 28, 2016
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

Presentation from tech-a-break & goJakarta event at Tokopedia


Slide Content

Manoj Awasthi,
Tech Architect @Tokopedia
Boltdb
an embedded key value database

Structure of this talk..

A bit of history..
Image
Server 1
Image
Server 2
Image
Server N
...
123.jpg : server_03;
246.jpg : server_02;
345.jpg : server_17;
….
….
tokopedia
image
router
.. as time passed

gradually, we kept newer images to s3:// ..
•All images uploaded from that point onwards could be
served from a single server
•no such mapping (mongodb) was required,
•Old images still being served in the same way and did need the mapping.
•But, now the database was “read only” and fixed size.

Also: We suffered frequent memory spikes and process kill by linux “out of
memory killer” (mongodb) which led both to latency and downtimes.

Search for alternative..
Requirements boiled down to: 

•Fast retrieval - needed all across
•Scalable - to tens of thousands of queries per second
•Persistent - don’t have to recompute everything from scratch on each bootup
(in case!)
Read only usage - not a
constraint but this could help in
“trade off”
Also, we can do with


Redis! Well, it could work well given our fixed data size and read only usage.
In fact, we did try and saw scale problems with redis (high cpu load).
Also $$.

We needed a
lightweight embedded
database ..
“BoltDB” - an embedded key value
database written in golang looked
interesting.

Why not redis?

Compact, fast.
Based on LMDB [0].
Both use B+ tree for storage, maintain ACID semantics with fully serializable
transactions, and support many other database features.
Simple
While LMDB focuses on raw performance, Boltdb is focussed on ease of use.
Fits better for a “read heavy” usage (read more, write less)
Written in golang so fits well with rest of the stack at Tokopedia.
[0] https://symas.com/products/lightning-memory-mapped-database/
Why boltdb?

Why boltdb?
In traditional sense, boltdb is not really a database but simply a memory
mapped file. But it provides ACID semantics and other properties associated
with databases so calling it a DB is not misnomer, though.
No installation
required
●It comes as a library
●Installation is as simple as 

importing it in your go program

Opening the 

database..
Add a key value
Fetch a value by key

bolt - command line utility
Bolt is a tool for inspecting bolt databases 

Things to use it for:
Check the integrity of bolt database
Run synthetic benchmarks against bolt database for gauging read and write
performance
Print basic info about database
Generate useful statistics on all pages in the database
Available under cmd/bolt in the github repository.

Caveat: random writes slow as the db grows!
Let’s get back to the problem we were solving. 

The raw data from mongodb exported using mongo-export utility was ~ 4G.
This translated to ~ 13G boltdb database file. 

Export tool that we wrote to export from mongo output to boltdb became much
slower as the size of the database grew. Hence we used sharding to horizontally
partition the data from mongo into many small files and have a smaller boltdb file
for each of them.

The result!
Following is the output of `free -m’ on one of the servers we use: 




Snippet of `top’ output from the same server:

Limitations
Bolt is good for read intensive workloads. Random writes can be slow.
Bolt uses B+ tree internally so there can be a lot of random page access. SSDs
provide a significant performance boost over spinning disks.
Bolt can handle databases much larger than available physical RAM, provided its
memory map fits in process address space. It may be problematic on 32 bit
systems.
The data structures used by bolt are memory mapped and hence endian specific.
This means that you cannot copy a bolt file from a little endian machine to a big
endian machine and have it work. (Most modern CPUs are little endian).

Conclusion
Boltdb worked pretty well for our usecase.
Service handles many thousands of queries per second, is not limited by physical RAM
and doing well! :D
Do give it a try if it fits some of your use case.
References:
[1] https://github.com/boltdb/bolt

[2] http://tech.tokopedia.com/blog/using-boltdb-as-a-fast-persistent-kv-store/

[3] https://symas.com/products/lightning-memory-mapped-database/

Connect with me over:
{ “Email”: “[email protected]”, 

“Twitter”: “https://twitter.com/awmanoj”, 

“Linkedin”: “https://www.linkedin.com/in/manojawasthi”, 

“Github”: “https://github.com/awmanoj/”, 

“Blog”: [ “http://awmanoj.github.io/”, “http://www.manojawasthi.com”]

}
Thank you!