Building a Micro MongoDB Driver
Learning how clients communicate with MongoDB [GitHub repository]
§1 Overview
In this article we'll write a minimal driver from ground up in C using the newest pre-release wire protocol. Along the way we'll learn the structure of the underlying messages drivers send to MongoDB servers.
§2 What is a MongoDB Driver?
As the documentation states:
Drivers for MongoDB are the client libraries that handle the interface between the application and the MongoDB servers and deployments.
If you're reading this article than you've likely used MongoDB, and consequently used a driver to access it. Even the MongoDB shell can be considered a driver. A driver abstracts away all of the nitty-gritty communication between your application and your MongoDB cluster.
Your application can call intuitive functions, like pymongo's db.coll.insert_one({_id: 1})
. Behind the scenes a message representing this insert is generated and sent over a TCP socket connected to a MongoDB server. The driver will then receive a response, and report back to the client with success or error. In order to build a driver, we need to learn precisely how to do this "behind the scenes" communication.
§3 Communicating with MongoDB
Before creating a client library, let's write code to insert a single document. It should do the following:
- Connect to a MongoDB server with a TCP socket
- Construct the insert message byte-by-byte
- Send the insert message over the socket
- Receive a reply from the socket
§3.1 Setting Up
If you'd like to follow along, download the latest copy of MongoDB. For this tutorial, we'll need to use the development pre-release MongoDB 3.5 (or later), which supports the protocol we're sending. Once that is downloaded, start up a single mongod instance on the default port 27017.
We'll also be using POSIX sockets, so the code we write should compile on Linux and MacOS.
§3.2 Connecting to a MongoDB Server in C
I confess I haven't done much socket programming in C. But referring to Beej's tutorial we can hack together minimal yet functional code to create a TCP socket connecting to our local MongoDB server.
It's a little dense to look at, but really not a lot is going on. All we're doing is defining the address to connect to (127.0.0.1 on port 27017), attempting to create a TCP socket, then connecting this socket to this address.
After compiling this with gcc -o connect connect.c
and running ./connect
we should hopefully see the connected message. This isn't much to see, but we can check the mongod log to see if our connection is being established.
Woohoo! We're connecting to our mongod with only 30 lines of C. You may not have been as lucky and see an error message with an errno. In the shell use perror <errno>
to get a error message from the number.
§3.3 Constructing our Message
Now that we're connecting, the next step is to send a message. But what kind of message are we supposed to send to insert a document?
The MongoDB wire protocol defines the byte-by-byte messages that a driver and a server can send to each other. Taking a quick glance, OP_INSERT looks promising. But the warning at the top of that document says otherwise:
Starting with MongoDB 2.6 and maxWireVersion 3, MongoDB drivers use the database commands insert, update, and delete instead of OP_INSERT, OP_UPDATE, and OP_DELETE for acknowledged writes. Most drivers continue to use opcodes for unacknowledged writes.
In fact, since MongoDB 3.2 all messages (including find, getMore, killCursors) have been implemented in terms of database commands instead of using separate op codes. Ok, so what is a database command and how do we send one in MongoDB 3.5+?
Let's look at the linked database command docs. We can scroll down to the insert command. It shows that an insert command has the following required structure:
{
insert: <collection>,
documents: [ <document>, <document>, <document>, ... ]
}
This isn't quite what we want. This is the document we would send via a driver, not the exact byte-by-byte message a driver sends through its socket. But we're on the right track. We can see this working in the MongoDB shell using the runCommand function.
> db.runCommand({insert: "test_coll", documents: [{_id: 1}]});
{ "n" : 1, "ok" : 1 }
That's nice, but how do we send a database command outside of the shell? What op code in the wire protocol does that fall under? If you skim through the wire protocol, you may think of using OP_COMMAND, there's another warning:
OP_COMMAND and OP_COMMANDREPLY are cluster internal and should not be implemented by clients or drivers.
That's definitely not it. Fret not, in MongoDB 3.5+, OP_MSG will be the only op code needed. Before this, commands were sent through a bit of a hack (OP_QUERY on a virtual $cmd collection). But now with OP_MSG there is a clean and unified way of communicating with a MongoDB server.
§3.4 The OP_MSG structure
Great, so now that we know to send an OP_MSG, let's refer back to the wire protocol reference. Here's the C-struct like reference of an OP_MSG:
OP_MSG {
int32 messageLength; // total message size, including this
int32 requestID; // identifier for this message
int32 responseTo; // used in responses
int32 opCode = 2013; // request type. 2013 for OP_MSG
uint32 flagBits; // message flags
Section* sections; // data sections
optional<int32> checksum; // (currently unused) checksum
}
/* Type 0 Section */
Section {
uint8 type = 0;
document document; // a bson document
}
/* Type 1 Section */
{
uint8 type = 1;
int32 size;
cstring identifier;
document* documents; // a sequence of bson documents
}
Take note that all ints are in little-endian. The server uses the responseTo to identify which requestID it's responding to. The Sections
contain the actual payload we want to send. In our case, the payload is the document {insert: "test_coll", documents: [{_id: 1}]}
. The document will have a slight modification. OP_MSG does not specify a database name, so instead it expects the document to have an extra "$db" key with the database name. The final document we'll be sending is {insert: "test_coll", $db: "db", documents: [{_id: 1}]}
There are two types of data sections. Type 0 is a single document, and type 1 is a sequence of documents. Type 1 sections are an optimization we can ignore for now. All we need to do is construct our document, jam it into an OP_MSG, and send it off.
§3.5 BSON
Before jumping back to code, let's note that the document field is not JSON, but BSON, which stands for Binary JSON. This is a binary JSON serialization created by MongoDB. First, let's construct our BSON document.
We want to construct the BSON representing the JSON {insert: "test_coll", $db: "db", documents: [{_id: 1}]}
. Fortunately, the BSON spec nicely specifies the grammar we can follow to construct our desired document.
Let's first start by making the document {"_id": 1}
by following the BSON spec grammar rules.
document
=> int32 e_list 0x00
Add the key/int32 value "_id": 1
int32 e_list 0x00 =>
int32 element e_list 0x00
int32 element e_list 0x00
=>
int32 0x10 e_name int32 e_list 0x00
int32 0x10 e_name int32 e_list 0x00
=>
int32 0x10 "_id\0" int32 e_list 0x00
Note, that our int32 value is little endian value 1, or 0x01000000
int32 0x10 "_id\0" int32 e_list 0x00
=>
int32 0x10 "_id\0" 0x01000000 e_list 0x00
Remove the last e_list
int32 0x10 "_id\0" 0x01000000 e_list 0x00
=>
int32 0x10 "_id\0" 0x01000000 0x00
The first int32 is the size of the entire document, 14, or 0x0E000000
int32 0x10 "_id\0" 0x01000000 0x00
=>
0x0E000000 0x10 "_id\0" 0x01000000 0x00
Great! Now we know exactly what bytes in BSON represent the document {_id: 1}
. Following suit, we can construct the entire command document. Tedious details aside, the document {insert: "test_coll", $db: "db", documents: [{_id: 1}]}
looks like the following in BSON:
0x48000000 0x02 "insert\0" 0x0A000000 "test_coll\0" 0x02 "$db\0", 0x03000000 "db\0" 0x04 "documents\0" 0x16000000 0x03 "0\0" 0x0E000000 0x10 "_id\0" 0x01000000 0x00 0x00 0x00
Now that we know how to represent the command document in BSON, let's jump back to the code.
§3.6 Putting it All Together
To send our insert command, we need to wrap the BSON representation in an OP_MSG message, and then send it! Here's the commented C code, which shows the message constructed byte-by-byte, and then receives a response from the server.
After compiling and running, you should see something similar to:
This looks promising! The response message seems garbled, but that's because we're only printing the ascii printable bytes, and ignoring everything else.
But now the true test, can we retrieve our document we inserted?
Woohoo! We have succesfully communicated just the right bytes to insert the document {_id: 1}
into the collection db.test_coll
!
If you run this again, you'll see a message like the following:
Although garbled, we can see the "duplicate key error". This is because in a collection _id
must be unique, but we're trying to insert a document with the same _id
. Normally, drivers can generate a unique _id
when clients omit one. Let's ignore this for now, since we're making an absolute minimal driver. We still have a little work to do to create our micro driver.
§4 Making a Library
We don't quite have a minimal driver yet. Our PoC code shows we can communicate, but this isn't providing a client library. Let's refactor this into a minimal library. Our driver library interface is as follows.
Our implementation is more of the same. Except the user provides the IP/port to connect to and the BSON command to send. Less work for us! Here's the definition of the command function.
Now a user can happily use the micro driver to it's fullest extent like so:
It might not be much, but our micro driver is functional! This could potentially be expanded to fully featured driver. This is left as an exercise for the reader.
§5 Further Reading
We've learned how to communicate with a single MongoDB server using OP_MSG. But we've only just barely scratched the surface. If you'd like to learn more, check out the OP_MSG specification and this article on upcoming features in MongoDB 3.6 which also gives a history of the wire protocol. If you'd like to see the code included in this article, see the GitHub repository.