Building a Micro MongoDB Driver

§1 Overview

In this article we'll write a minimal driver from ground up in C using the newest pre-release wire protocol. Along the way we'll learn the structure of the underlying messages drivers send to MongoDB servers.

§2 What is a MongoDB Driver?

As the documentation states:

Drivers for MongoDB are the client libraries that handle the interface between the application and the MongoDB servers and deployments.

If you're reading this article than you've likely used MongoDB, and consequently used a driver to access it. Even the MongoDB shell can be considered a driver. A driver abstracts away all of the nitty-gritty communication between your application and your MongoDB cluster.

Your application can call intuitive functions, like pymongo's db.coll.insert_one({_id: 1}). Behind the scenes a message representing this insert is generated and sent over a TCP socket connected to a MongoDB server. The driver will then receive a response, and report back to the client with success or error. In order to build a driver, we need to learn precisely how to do this "behind the scenes" communication.

§3 Communicating with MongoDB

Before creating a client library, let's write code to insert a single document. It should do the following:

Connect to a MongoDB server with a TCP socket
Construct the insert message byte-by-byte
Send the insert message over the socket
Receive a reply from the socket

§3.1 Setting Up

If you'd like to follow along, download the latest copy of MongoDB. For this tutorial, we'll need to use the development pre-release MongoDB 3.5 (or later), which supports the protocol we're sending. Once that is downloaded, start up a single mongod instance on the default port 27017.

We'll also be using POSIX sockets, so the code we write should compile on Linux and MacOS.

§3.2 Connecting to a MongoDB Server in C

I confess I haven't done much socket programming in C. But referring to Beej's tutorial we can hack together minimal yet functional code to create a TCP socket connecting to our local MongoDB server.

// compile with gcc -o connect connect.c
#include <arpa/inet.h>  // inet_addr
#include <errno.h>      // errno
#include <stdio.h>      // printf
#include <sys/socket.h> // socket
#include <unistd.h>     // close

int main () {   
  // create a socket to a mongod instance on localhost port 27017
  struct sockaddr_in addr = {0};
  addr.sin_family = AF_INET;
  addr.sin_addr.s_addr = inet_addr("127.0.0.1");
  addr.sin_port = htons(27017);

  int socket_fd = socket(AF_INET, SOCK_STREAM, 0);
  if (socket_fd == -1) {
    printf("Could not create socket, errno=%d\n", errno);
    return -1;
  }

  // connect to the remote address, associating this socket
  int ret = connect (
    socket_fd, (struct sockaddr*)&addr, sizeof (struct sockaddr));

  if (ret != 0) {
    printf("Could not connect, errno=%d\n", errno);
    return 1;
  }

  printf ("Connected to mongod at 127.0.0.1 port 27017\n");
  close (socket_fd); 
}

connect.c - connecting to a single mongod

It's a little dense to look at, but really not a lot is going on. All we're doing is defining the address to connect to (127.0.0.1 on port 27017), attempting to create a TCP socket, then connecting this socket to this address.

After compiling this with gcc -o connect connect.c and running ./connect we should hopefully see the connected message. This isn't much to see, but we can check the mongod log to see if our connection is being established.

Woohoo! We're connecting to our mongod with only 30 lines of C. You may not have been as lucky and see an error message with an errno. In the shell use perror <errno> to get a error message from the number.

§3.3 Constructing our Message

Now that we're connecting, the next step is to send a message. But what kind of message are we supposed to send to insert a document?

The MongoDB wire protocol defines the byte-by-byte messages that a driver and a server can send to each other. Taking a quick glance, OP_INSERT looks promising. But the warning at the top of that document says otherwise:

Starting with MongoDB 2.6 and maxWireVersion 3, MongoDB drivers use the database commands insert, update, and delete instead of OP_INSERT, OP_UPDATE, and OP_DELETE for acknowledged writes. Most drivers continue to use opcodes for unacknowledged writes.

In fact, since MongoDB 3.2 all messages (including find, getMore, killCursors) have been implemented in terms of database commands instead of using separate op codes. Ok, so what is a database command and how do we send one in MongoDB 3.5+?

Let's look at the linked database command docs. We can scroll down to the insert command. It shows that an insert command has the following required structure:

{
 insert: <collection>,
 documents: [ <document>, <document>, <document>, ... ]
}

This isn't quite what we want. This is the document we would send via a driver, not the exact byte-by-byte message a driver sends through its socket. But we're on the right track. We can see this working in the MongoDB shell using the runCommand function.

> db.runCommand({insert: "test_coll", documents: [{_id: 1}]});
{ "n" : 1, "ok" : 1 }

That's nice, but how do we send a database command outside of the shell? What op code in the wire protocol does that fall under? If you skim through the wire protocol, you may think of using OP_COMMAND, there's another warning:

OP_COMMAND and OP_COMMANDREPLY are cluster internal and should not be implemented by clients or drivers.

That's definitely not it. Fret not, in MongoDB 3.5+, OP_MSG will be the only op code needed. Before this, commands were sent through a bit of a hack (OP_QUERY on a virtual $cmd collection). But now with OP_MSG there is a clean and unified way of communicating with a MongoDB server.

§3.4 The OP_MSG structure

Great, so now that we know to send an OP_MSG, let's refer back to the wire protocol reference. Here's the C-struct like reference of an OP_MSG:

OP_MSG {
    int32    messageLength;   // total message size, including this
    int32    requestID;       // identifier for this message
    int32    responseTo;      // used in responses
    int32    opCode = 2013;   // request type. 2013 for OP_MSG
    uint32   flagBits;        // message flags
    Section* sections;        // data sections
    optional<int32> checksum; // (currently unused) checksum
}

/* Type 0 Section */
Section {
    uint8    type = 0;
    document document;        // a bson document
}

/* Type 1 Section */
{
    uint8     type = 1;
    int32     size;
    cstring   identifier;
    document* documents;      // a sequence of bson documents
}

Take note that all ints are in little-endian. The server uses the responseTo to identify which requestID it's responding to. The Sections contain the actual payload we want to send. In our case, the payload is the document {insert: "test_coll", documents: [{_id: 1}]}. The document will have a slight modification. OP_MSG does not specify a database name, so instead it expects the document to have an extra "$db" key with the database name. The final document we'll be sending is {insert: "test_coll", $db: "db", documents: [{_id: 1}]}

There are two types of data sections. Type 0 is a single document, and type 1 is a sequence of documents. Type 1 sections are an optimization we can ignore for now. All we need to do is construct our document, jam it into an OP_MSG, and send it off.

§3.5 BSON

Before jumping back to code, let's note that the document field is not JSON, but BSON, which stands for Binary JSON. This is a binary JSON serialization created by MongoDB. First, let's construct our BSON document.

We want to construct the BSON representing the JSON {insert: "test_coll", $db: "db", documents: [{_id: 1}]}. Fortunately, the BSON spec nicely specifies the grammar we can follow to construct our desired document.

Let's first start by making the document {"_id": 1} by following the BSON spec grammar rules.

document => int32 e_list 0x00

Add the key/int32 value "_id": 1

int32 e_list 0x00 => int32 element e_list 0x00


                int32 element e_list 0x00 =>
 int32 0x10 e_name int32 e_list 0x00
                int32 0x10 e_name int32 e_list 0x00 =>
 int32 0x10 "_id\0" int32 e_list 0x00
                Note, that our int32 value is little endian value 1, or 0x01000000
                int32 0x10 "_id\0" int32 e_list 0x00 =>
 int32 0x10 "_id\0" 0x01000000 e_list 0x00
                Remove the last e_list
                int32 0x10 "_id\0" 0x01000000 e_list 0x00 => 

                    int32 0x10 "_id\0" 0x01000000 0x00
                The first int32 is the size of the entire document, 14, or 0x0E000000
                int32 0x10 "_id\0" 0x01000000 0x00 =>
 0x0E000000 0x10 "_id\0" 0x01000000 0x00



            Great! Now we know exactly what bytes in BSON represent the document {_id: 1}. Following suit, we can construct the entire command document. Tedious details aside, the document {insert: "test_coll", $db: "db", documents: [{_id: 1}]} looks like the following in BSON:

            0x48000000 0x02 "insert\0" 0x0A000000 "test_coll\0" 0x02 "$db\0", 0x03000000 "db\0" 0x04 "documents\0" 0x16000000 0x03 "0\0" 0x0E000000 0x10 "_id\0" 0x01000000 0x00 0x00 0x00

            
            Now that we know how to represent the command document in BSON, let's jump back to the code.

            §3.6 Putting it All Together
            
                To send our insert command, we need to wrap the BSON representation in an OP_MSG message, and then send it! Here's the commented C code, which shows the message constructed byte-by-byte, and then receives a response from the server.
            
            
                // compile with gcc -o poc poc.c
#include <arpa/inet.h>  // inet_addr
#include <ctype.h>      // isascii
#include <errno.h>      // errno
#include <stdio.h>      // printf
#include <sys/socket.h> // socket
#include <unistd.h>     // close

int main () {   
  // create a socket to a mongod instance on localhost port 27017
  struct sockaddr_in addr = {0};
  addr.sin_family = AF_INET;
  addr.sin_addr.s_addr = inet_addr("127.0.0.1");
  addr.sin_port = htons(27017);

  int socket_fd = socket(AF_INET, SOCK_STREAM, 0);
  if (socket_fd == -1) {
    printf("Could not create socket, errno=%d\n", errno);
    return -1;
  }

  // connect to the remote address, associating this socket
  int ret = connect (
    socket_fd, (struct sockaddr*)&addr, sizeof (struct sockaddr));

  if (ret == -1) {
    printf("Could not connect, errno=%d\n", errno);
    return 1;
  }

  printf ("Connected to mongod at 127.0.0.1 port 27017\n");

  char op_msg[] = {
    0x5D, 0x00, 0x00, 0x00, // total message size, including this
    0x00, 0x00, 0x00, 0x00, // requestID (can be 0)
    0x00, 0x00, 0x00, 0x00, // responseTo (unused for sending)
    0xDD, 0x07, 0x00, 0x00, // opCode = 2013 = 0x7DD for OP_MSG
    0x00, 0x00, 0x00, 0x00, // message flags (not needed)
    0x00,                   // only data section, type 0
    // begin bson command document
    // {insert: "test_coll", $db: "db", documents: [{_id:1}]}
    0x48, 0x00, 0x00, 0x00, // total bson obj length

    // insert: "test_coll" key/value
    0x02, 'i','n','s','e','r','t','\0',
    0x0A, 0x00, 0x00, 0x00, // "test_coll" length
    't','e','s','t','_','c','o','l','l','\0',

    // $db: "db"
    0x02, '$','d','b','\0',
    0x03, 0x00, 0x00, 0x00,
    'd','b','\0',

    // documents: [{_id:1}]
    0x04, 'd','o','c','u','m','e','n','t','s','\0',
    0x16, 0x00, 0x00, 0x00, // start of {0: {_id: 1}} 
    0x03, '0', '\0', // key "0"
    0x0E, 0x00, 0x00, 0x00, // start of {_id: 1}
    0x10, '_','i','d','\0', 0x01, 0x00, 0x00, 0x00,
    0x00,                   // end of {"id: 1}
    0x00,                   // end of {0: {_id: 1}}
    0x00                    // end of command document
  };

  ret = send (socket_fd, op_msg, sizeof(op_msg), 0);
  if (ret == -1) {
    printf("Could not send, errno=%d\n", errno);
    return 1;
  }

  printf("Message sent\n");

  char reply_buf[512];
  int num_recv = recv(socket_fd, reply_buf, 512, 0);
  if (num_recv == -1) {
    printf("Could not send, errno=%d\n", errno);
    return 1;
  }

  printf("Message received\n");

  int i;
  for (i = 0; i < num_recv; i++) {
    if (isascii(reply_buf[i])) printf("%c", reply_buf[i]);
    else printf(".");
  }
  printf("\n");

  close (socket_fd); 
}
            poc.c - a proof-of-concept sending command and receiving reply
            
            After compiling and running, you should see something similar to:
            
                
            
            This looks promising! The response message seems garbled, but that's because we're only printing the ascii printable bytes, and ignoring everything else.
            But now the true test, can we retrieve our document we inserted?
            
                
            
            Woohoo! We have succesfully communicated just the right bytes to insert the document {_id: 1} into the collection db.test_coll!
            If you run this again, you'll see a message like the following:
            
                
            
            Although garbled, we can see the "duplicate key error". This is because in a collection _id must be unique, but we're trying to insert a document with the same _id. Normally, drivers can generate a unique _id when clients omit one. Let's ignore this for now, since we're making an absolute minimal driver. We still have a little work to do to create our micro driver.
            §4 Making a Library
            We don't quite have a minimal driver yet. Our PoC code shows we can communicate, but this isn't providing a client library. Let's refactor this into a minimal library. Our driver library interface is as follows.


#ifndef MICRO_DRIVER_H
#define MICRO_DRIVER_H

// forward declare a struct representing a client handle
typedef struct mongo_client_t_private mongo_client_t;

// connect to a single mongod server. returns null on error.
mongo_client_t* mongo_connect (char* ip, int port);

// clean up the client handle
void mongo_disconnect (mongo_client_t* client);

// send a command and get a reply, returns the number of bytes
// recieved or -1 on error
int mongo_send_command (mongo_client_t* client,
                        char* command,
                        int command_size,
                        char* reply,
                        int reply_size);

#endif
micro_driver.h - the interface of our micro driver

            Our implementation is more of the same. Except the user provides the IP/port to connect to and the BSON command to send. Less work for us! Here's the definition of the command function.


int mongo_send_command (mongo_client_t* client,
                        char* command,
                        int command_size,
                        char* reply,
                        int reply_size) {

  static char op_msg_header[] = {
    0x00, 0x00, 0x00, 0x00, // total message size, including this
    0x00, 0x00, 0x00, 0x00, // requestID (can be 0)
    0x00, 0x00, 0x00, 0x00, // responseTo (unused for sending)
    0xDD, 0x07, 0x00, 0x00, // opCode = 2013 = 0x7DD for OP_MSG
    0x00, 0x00, 0x00, 0x00, // message flags (not needed)
    0x00                    // only data section, type 0
  };
  // allocate enough memory for the header and command
  int total_bytes = sizeof(op_msg_header) + command_size;
  char *op_msg = (char*)malloc(total_bytes);
  memcpy (op_msg, op_msg_header, sizeof(op_msg_header));
  memcpy (op_msg + sizeof(op_msg_header), command, command_size);

  // set the length of the total op_msg
  int *as_int = (int*)op_msg;
  *as_int = total_bytes; // hope you're on a little-endian machine :)

  int ret = send (client->socket_fd, op_msg, total_bytes, 0);
  free (op_msg);
  if (ret == -1) return -1;

  return recv (client->socket_fd, reply, reply_size, 0);
}
micro_driver.c - the implementation of our micro driver


            Now a user can happily use the micro driver to it's fullest extent like so:
            
// compile with gcc -o app app.c micro_driver.c
#include <errno.h> // errno
#include <stdio.h> // printf

#include "micro_driver.h"

int main() {
  mongo_client_t* client = mongo_connect("127.0.0.1", 27017);
  if (client == NULL) {
    printf ("Could not connect, error=%d\n", errno);
    return 1;
  }

  char command[] = {
    // {insert: "test_coll", $db: "db", documents: [{x:1}]}
    0x48, 0x00, 0x00, 0x00, // total bson obj length

    // insert: "test_coll" key/value
    0x02, 'i','n','s','e','r','t','\0',
    0x0A, 0x00, 0x00, 0x00, // "test_coll" length
    't','e','s','t','_','c','o','l','l','\0',

    // $db: "db"
    0x02, '$','d','b','\0',
    0x03, 0x00, 0x00, 0x00,
    'd','b','\0',

    // documents: [{_id:1}]
    0x04, 'd','o','c','u','m','e','n','t','s','\0',
    0x16, 0x00, 0x00, 0x00, // start of {0: {_id: 1}} 
    0x03, '0', '\0', // key "0"
    0x0E, 0x00, 0x00, 0x00, // start of {_id: 1}
    0x10, '_','i','d','\0', 0x01, 0x00, 0x00, 0x00,
    0x00,                   // end of {_id: 1}
    0x00,                   // end of {0: {_id: 1}}
    0x00                    // end of command document
  };

  char reply_buf[512];
  int num_recv = mongo_send_command (client,
                                     command,
                                     sizeof(command),
                                     reply_buf,
                                     sizeof(reply_buf));
  if (num_recv == -1) {
    printf("Could not send, errno=%d\n", errno);
    return 1;
  }

  printf ("Command sent, recieved %d bytes\n", num_recv);
  mongo_disconnect (client);
}

app.c - a user application using the driver

            It might not be much, but our micro driver is functional! This could potentially be expanded to fully featured driver. This is left as an exercise for the reader.
            §5 Further Reading
            We've learned how to communicate with a single MongoDB server using OP_MSG. But we've only just barely scratched the surface. If you'd like to learn more, check out the OP_MSG specification and this  article on upcoming features in MongoDB 3.6 which also gives a history of the wire protocol. If you'd like to see the code included in this article, see the GitHub repository.