Install ClusterControl on Top of Existing MongoDB Sharded Cluster

In this post, we are going to show you on how to install and integrate ClusterControl on top of an existing MongoDB Sharded Cluster with a replica set of 3 nodes

 

MongoDB Sharded Cluster Setup

 

In a sharded cluster, we need to have three types of server:

  • config server (configsvr) – holds metadata of the cluster (minimum 3 servers)
  • shard server (shardsvr) – container that holds subset of data, including replica set (minimum 2 servers)
  • routing server (mongos) – route operations from applications and clients to the shardsvr instances (minimum 1 server)

 

The following sequence explains query routing in a sharded cluster:

  1. Application sends a write query to one of mongos (port 27017)
  2. mongos connects to configsvr (port 27018) to determine the primary shardsvr
  3. mongos then connects to a primary shardsvr (port 27019) to write the data
  4. Data partitioning (sharding) and replication will be automatically handled by shardsvr instance

 

In our setup, we have 3 servers running CentOS 6.3 64bit. On each server, we have colocated a configsvr, shardsvr and mongos. Each server has 3 MongoDB configuration files:

  • /etc/mongodb.config.conf – configsvr configuration
  • /etc/mongodb.shard.conf – shardsvr and replSet configuration
  • /etc/mongos.conf – mongos configuration

 

Our MongoDB dbpath is located at /var/lib/mongodb, while configdb is located at/var/lib/mongodb/configdb and all MongoDB logs generated under /var/log/mongodb directory.

We started all MongoDB instances using following commands in each server:

$ mongod -f /etc/mongodb.config.conf
$ mongod -f /etc/mongodb.shard.conf
$ mongos -f /etc/mongos.conf

 

 

Install ClusterControl Server

We will need a separate server to run ClusterControl, as illustrated below:

1. SSH into ClusterControl server and make sure that you have IPtables and SElinux turned off:

$ service iptables stop
$ setenforce 0
$ sed -i.bak 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config

2. It is highly recommended to enable passwordless SSH with key authentication between ClusterControl and agents. Generate a RSA key and copy it to all nodes:

$ ssh-keygen -t rsa # just press enter for all prompts
$ ssh-copy-id -i ~/.ssh/id_rsa root@192.168.197.40
$ ssh-copy-id -i ~/.ssh/id_rsa root@192.168.197.41
$ ssh-copy-id -i ~/.ssh/id_rsa root@192.168.197.42
$ ssh-copy-id -i ~/.ssh/id_rsa root@192.168.197.43

3. On the ClusterControl server, install Apache, PHP, MySQL and other required components:

$ yum install httpd php php-mysql php-gd mysql-server mysql cronie sudo mailx -y

4. Download ClusterControl for MongoDB and required packages from Severalnines website:

$ wget http://www.severalnines.com/downloads/cmon/cmon-mongodb-controller-1.2.4-1.x86_64.rpm 
$ wget http://www.severalnines.com/downloads/cmon/cmon-mongodb-www-1.2.4-1.noarch.rpm

5. Install ClusterControl web apps and create graph directory:

$ rpm -Uhv cmon-mongodb-www-1.2.4-1.noarch.rpm
$ mkdir /var/www/html/cmon/graph

6. Install the CMON controller:

$ rpm -Uhv cmon-mongodb-controller-1.2.4-1.x86_64.rpm

7. Disable name resolve in MySQL. This will allow us to use IP address only when granting database user. Add following line into /etc/my.cnf under [mysqld] directive:

skip-name-resolve

8. Enable auto-start on boot of MySQL, start MySQL, create CMON database and import the schema for CMON:

$ chkconfig mysqld on
$ service mysqld start
$ mysql -e “CREATE DATABASE cmon”
$ mysql < /usr/share/cmon/cmon_db.sql
$ mysql < /usr/share/cmon/cmon_data.sql

9. Enter MySQL console and allow CMON database users:

> GRANT ALL ON *.* TO 'cmon'@'192.168.197.40' IDENTIFIED BY 'cmonP4ss' WITH GRANT OPTION;
> GRANT ALL ON *.* TO 'cmon'@'127.0.0.1' IDENTIFIED BY 'cmonP4ss' WITH GRANT OPTION;
> GRANT SUPER,INSERT,SELECT,UPDATE,DELETE ON *.* TO 'cmon'@'192.168.197.41' IDENTIFIED BY 'cmonP4ss';
> GRANT SUPER,INSERT,SELECT,UPDATE,DELETE ON *.* TO 'cmon'@'192.168.197.42' IDENTIFIED BY 'cmonP4ss';
> GRANT SUPER,INSERT,SELECT,UPDATE,DELETE ON *.* TO 'cmon'@'192.168.197.43' IDENTIFIED BY 'cmonP4ss';

10. Configure MySQL root password:

$ mysqladmin -u root password ‘MyP4ss’
$ mysqladmin -h127.0.0.1 -u root password ‘MyP4ss’

11. Configure CMON as controller by editing /etc/cmon.cnf:

# CMON config file
## id and name of cluster that this cmon agent is monitoring.
## Must be unique for each monitored cluster, like server-id in mysql
cluster_id=1
name=default_repl_1
mode=controller
type=mongodb

# MySQL for CMON
## Port of mysql server holding cmon database
mysql_port=3306
## Hostname/IP of mysql server holding cmon database
mysql_hostname=192.168.197.40
## Password for 'cmon' user on  the 'mysql_hostname'
mysql_password=cmonP4ss
local_mysql_port=3306
local_mysql_password=cmonP4ss
mysql_basedir=/usr/

# CMON service
## Hostname/IP of the server of this cmon instance
hostname=192.168.197.40
## ouser - the user owning the cmon_core_dir above
osuser=root
os=redhat
## logfile is default to syslog
logfile=/var/log/cmon.log
## Location of cmon.pid file. The pidfile is written in /tmp/ by default
pidfile=/var/run/
nodaemon=0

# MongoDB configdb location
monitored_mountpoints=/var/lib/mongodb/configdb
## All mongodb instances with port (comma separated)
mongodb_server_addresses=192.168.197.41:27018,192.168.197.42:27018,192.168.197.43:27018
mongocfg_server_addresses=192.168.197.41:27019,192.168.197.42:27019,192.168.197.43:27019
mongos_server_addresses=192.168.197.41:27017,192.168.197.42:27017,192.168.197.43:27017
mongodb_basedir=/usr/

# CMON stats options
db_stats_collection_interval=10
host_stats_collection_interval=60
ssh_opts=-nq

 

 

Install ClusterControl Agents

 

ClusterControl agents must reside in all MongoDB nodes. The agents are responsible for the following:

  • Restarting failed processes
  • Collecting host stats (disk/network/CPU/RAM)
  • Reading and parsing log files

 

1. Login to mongo1 via SSH, download and install CMON MongoDB agent:

$ wget http://www.severalnines.com/downloads/cmon/cmon-mongodb-agent-1.2.4-1.x86_64.rpm
$ rpm -Uhv cmon-mongodb-agent-1.2.4-1.x86_64.rpm

 

2. Configure CMON as agent by editing /etc/cmon.cnf:

# CMON config file
## id and name of cluster that this cmon agent is monitoring.
## Must be unique for each monitored cluster, like server-id in mysql
cluster_id=1
name=default_repl_1
mode=agent
type=mongodb

# MySQL for CMON
## Port of mysql server holding cmon database
mysql_port=3306
## Hostname/ip of mysql server holding cmon database
mysql_hostname=192.168.197.40
## Password for 'cmon' user on  the 'mysql_hostname'
mysql_password=cmonP4ss
local_mysql_port=3306
local_mysql_password=cmonP4ss
# CMON service
## Hostname/IP of the server of this cmon instance
hostname=192.168.197.41
## osuser - the user owning the cmon_core_dir above
osuser=root
## logfile is default to syslog
logfile=/var/log/cmon.log
## location of cmon.pid file. The pidfile is written in /tmp/ by default
pidfile=/var/run/
nodaemon=0

# MongoDB config database
monitored_mountpoints=/var/lib/mongodb/configdb

 

3. Repeat above steps for mongo2 and mongo3. Make sure to change the value of “hostname” on the respective nodes.

 

 

Start the Cluster

 

1. We will begin by enabling Apache and CMON on boot, followed by starting Apache and CMON service in ClusterControl server:

$ chkconfig httpd on
$ chkconfig cmon on
$ service httpd start
$ service cmon start

 

2. Next, login to mongo1, mongo2 and mongo3 to start CMON agent service:

$ chkconfig cmon on
$ service cmon start

 

Configure ClusterControl UI

 

1. To install the new ClusterControl UI, SSH into the ClusterControl host, download the ClusterControl installation script, change script permissions and execute it:

$ wget http://www.severalnines.com/downloads/cmon/setup-cc-ui.sh
$ chmod +x setup-cc-ui.sh
$ ./setup-cc-ui.sh

 

2. To finalize the UI installation, open web browser and go to this URL, http://ClusterControl_IP_address/install and you should see “Install ClusterControl UI and API” page.

 

**Please note the ClusterControl API Access Token, ClusterControl API URL, your login email and login password. We will use these later on the cluster registration page.

 

3. After the installation, click “Click here to automatically register your cluster now!” and you will redirected to cmonapi page similar to screenshot below. Click “Login Now”.

 

 

4. After that, you need to login using the email address you specified in the installation page with respective password. (default is “admin”). You should see the “Cluster Registrations” page similar to screenshot below. Enter the ClusterControl API token and URL:

 

5. You will be redirected to ClusterControl UI located at http://ClusterControl_IP_address/clustercontrol, your MongoDB Cluster is listed on this page. Click on it to view your cluster:

 

 

You’re done! You are now able to manage your MongoDB sharded cluster using ClusterControl!

MongoDb Architecture

NOSQL has become a very heated topic for large web-scale deployment where scalability and semi-structured data driven the DB requirement towards NOSQL. There has been many NOSQL products evolving in over last couple years. In my past blogs, I have been covering the underlying distributed system theory of NOSQL, as well as some specific products such as CouchDB and Cassandra/HBase.

Last Friday I was very lucky to meet with Jared Rosoff from 10gen in a technical conference and have a discussion about the technical architecture of MongoDb. I found the information is very useful and want to share with more people.

One thing I am very impressed by MongoDb is that it is extremely easy to use and the underlying architecture is also very easy to understand.

Here are some simple admin steps to start/stop MongoDb server

# Install MongoDB
mkdir /data/lib

# Start Mongod server
.../bin/mongod # data stored in /data/db

# Start the command shell
.../bin/mongo
> show dbs
> show collections

# Remove collection
> db.person.drop()

# Stop the Mongod server from shell
> use admin
> db.shutdownServer()

Major difference from RDBMS
MongoDb differs from RDBMS in the following way

  • Unlike an RDBMS record which is “flat” (a fixed number of simple data type), the basic unit of MongoDb is “document” which is “nested” and can contain multi-value fields (arrays, hash).
  • Unlike RDBMS where all records stored in a table must be confined to the table schema, documents of any structure can be stored in the same collection.
  • There is no “join” operation in the query. Overall, data is encouraged to be organized in a more denormalized manner and the more burden of ensuring data consistency is pushed to the application developers
  • There is no concept of “transaction” in MongoDb. “Atomicity” is guaranteed only at the document level (no partial update of a document will occurred).
  • There is no concept of “isolation”, any data read by one client may have its value modified by another concurrent client.

By removing some of those features that a classical RDBMS will provide, MongoDb can be more light-weight and be more scalable in processing big data.
Query processingMongoDb belongs to the type of document-oriented DB. In this model, data is organized as JSON document, and store into a collection. Collection can be thought for equivalent to Table and Document is equivalent to records in RDBMS world.

Here are some basic example.

# create a doc and save into a collection
> p = {firstname:"Dave", lastname:"Ho"}
> db.person.save(p)
> db.person.insert({firstname:"Ricky", lastname:"Ho"})

# Show all docs within a collection
> db.person.find()

# Iterate result using cursor
> var c = db.person.find()
> p1 = c.next()
> p2 = c.next()

To specify the search criteria, an example document containing the fields that needs to match against need to be provided.

> p3 = db.person.findone({lastname:"Ho"})

Notice that in the query, the value portion need to be determined before the query is made (in other words, it cannot be based on other attributes of the document). For example, lets say if we have a collection of “Person”, it is not possible to express a query that return person whose weight is larger than 10 times of their height.

# Return a subset of fields (ie: projection)
> db.person.find({lastname:"Ho"}, {firstname:true})

# Delete some records
> db.person.remove({firstname:"Ricky"})

To speed up the query, index can be used. In MongoDb, index is stored as a BTree structure (so range query is automatically supported). Since the document itself is a tree, the index can be specified as a path and drill into deep nesting level inside the document.

# To build an index for a collection
> db.person.ensureIndex({firstname:1})

# To show all existing indexes
> db.person.getIndexes()

# To remove an index
> db.person.dropIndex({firstname:1})

# Index can be build on a path of the doc.
> db.person.ensureIndex({"address.city":1})

# A composite key can be used to build index
> db.person.ensureIndex({lastname:1, firstname:1})

Index can also be build on an multi-valued attribute such as an array. In this case, each element in the array will have a separate node in the BTree.

Building an index can be done in both offline foreground mode or online background mode. Foreground mode will proceed much faster but the DB cannot be access during the build index period. If the system is running in a replica set (describe below), it is recommended to rotate each member DB offline and build the index in foreground.

When there are multiple selection criteria in a query, MongoDb attempts to use one single best index to select a candidate set and then sequentially iterate through them to evaluate other criteria.

When there are multiple indexes available for a collection. When handling a query the first time, MongoDb will create multiple execution plans (one for each available index) and let them take turns (within certain number of ticks) to execute until the fastest plan finishes. The result of the fastest executor will be returned and the system remembers the corresponding index used by the fastest executor. Subsequent query will use the remembered index until certain number of updates has happened in the collection, then the system repeats the process to figure out what is the best index at that time.

Since only one index will be used, it is important to look at the search or sorting criteria of the query and build additional composite index to match the query better. Maintaining an index is not without cost as index need to be updated when docs are created, deleted and updated, which incurs overhead to the update operations. To maintain an optimal balance, we need to periodically measure the effectiveness of having an index (e.g. the read/write ratio) and delete less efficient indexes.

Storage Model
Written in C++, MongoDB uses a memory map file that directly map an on-disk data file to in-memory byte array where data access logic is implemented using pointer arithmetic. Each document collection is stored in one namespace file (which contains metadata information) as well as multiple extent data files (with an exponentially/doubling increasing size).


The data structure uses doubly-linked-list extensively. Each collection of data is organized in a linked list of extents each of which represents a contiguous disk space. Each extent points to a head/tail of another linked list of docs. Each doc contains a linked list to other documents as well as the actual data encoded in BSON format.

Data modification happens in place. In case the modification increases the size of record beyond its originally allocated space, the whole record will be moved to a bigger region with some extra padding bytes. The padding bytes is used as an growth buffer so that future expansion doesn’t necessary require moving the data again. The amount of padding is dynamically adjusted per collection based on its modification statistics. On the other hand, the space occupied by the original doc will be free up. This is kept tracked by a list of free list of different size.

As we can imagine holes will be created over time as objects are created, deleted or modified, this fragmentation will hurt performance as less data is being read/write per disk I/O. Therefore, we need to run the “compact” command periodically, which copy the data to a contiguous space. This “compact%2

How To Create a Sharded Cluster in MongoDB Using an Ubuntu 12.04

Introduction


MongoDB is a NoSQL document database system that scales well horizontally and implements data storage through a key-value system. A popular choice for web applications and websites, MongoDB is easy to implement and access programmatically.

MongoDB achieves scaling through a technique known as “sharding”. Sharding is the process of writing data across different servers to distribute the read and write load and data storage requirements.

In a previous tutorial, we covered how to install MongoDB on an Ubuntu 12.04. We will use this as a jumping off point to talk about how to implement sharding across a number of different nodes.

MongoDB Sharding Topology


Sharding is implemented through three separate components. Each part performs a specific function:

  • Config Server: Each production sharding implementation must contain exactly three configuration servers. This is to ensure redundancy and high availability.

    Config servers are used to store the metadata that links requested data with the shard that contains it. It organizes the data so that information can be retrieved reliably and consistently.

  • Query Routers: The query routers are the machines that your application actually connects to. These machines are responsible for communicating to the config servers to figure out where the requested data is stored. It then accesses and returns the data from the appropriate shard(s).

    Each query router runs the “mongos” command.

  • Shard Servers: Shards are responsible for the actual data storage operations. In production environments, a single shard is usually composed of a replica set instead of a single machine. This is to ensure that data will still be accessible in the event that a primary shard server goes offline.

Initial Set Up


If you were paying attention above, you probably noticed that this configuration requires quite a few machines. In this tutorial, we will configure an example sharding cluster that contains:

  • 3 Config Servers (Required in production environments)
  • 2 Query Routers (Minimum of 1 necessary)
  • 3 Shard Servers (Minimum of 2 necessary)

We will go above this minimum in order to demonstrate adding multiple components of each type. We will also treat all of these components as discrete machines for clarity and simplicity.

We use Config Server + Shard Server in the same server, and Query Router in a separate server. So, totally we use 5 servers for this demonstration, and each of this was installed MongoDB successfully.

 

Configure DNS Subdomain Entries for Each Component (Optional)


The MongoDB documentation recommends that you refer to all of your components by a DNS resolvable name instead of by a specific IP address. This is important because it allows you to change servers or redeploy certain components without having to restart every server that is associated with it.

For ease of use, I recommend that you give each server its own subdomain on the domain that you wish to use. For the purposes of this tutorial, we will refer to the components as being accessible at these subdomain:

  • Config Servers
    • config0.agile.vn
    • config1.agile.vn
    • config2.agile.vn
  • Query Routers
    • query0.agile.vn
    • query1.agile.vn
  • Shard Servers
    • shard0.agile.vn
    • shard1.agile.vn
    • shard2.agile.vn

If you do not set up subdomains, you can still follow along, but your configuration will not be as robust. If you wish to go this route, simply substitute the subdomain specifications with your droplet’s IP address.

Initialize the Config Servers


The first components that must be set up are the configuration servers. These must be online and operational before the query routers or shards can be configured.

Log into your first configuration server as root.

The first thing we need to do is create a data directory, which is where the configuration server will store the metadata that associates location and content:

mkdir /mongo-metadata

Now, we simply have to start up the configuration server with the appropriate parameters. The service that provides the configuration server is called mongod. The default port number for this component is 27019.

We can start the configuration server with the following command:

mongod --fork --logpath /var/log/mongod.log --configsvr --dbpath /mongo-metadata --port 27019

The server will start listening for connections from other components, and run in the background as the daemon.

Repeat this process exactly on the other two config servers. The port number should be the same across all three servers, 27019.

Configure Query Router Instances


At this point, you should have all three of your configuration servers running and listening for connections. They must be operational before continuing.

Log into your first query router as root.

The first thing we need to do is stop the mongodb process on this instance if it is already running. The query routers use data locks that conflict with the main MongoDB process:

service mongodb stop

Next, we need to start the query router service with a specific configuration string. The configuration string must be exactly the same for every query router you configure (including the order of arguments). It is composed of the address of each configuration server and the port number it is operating on, separated by a comma.

They query router service is called mongos. The default port number for this process is 27017(but the port number in the configuration refers to the configuration server port number, which is 27019 by default).

The end result is that the query router service is started with a string like this:

mongos --configdb config0.agile.vn:27019,config1.agile.vn:27019,config2.agile.vn:27019

Your first query router should begin to connect to the three configuration servers. Repeat these steps on the other query router. Remember that the mongodb service must be stopped prior to typing in the command.

Also, keep in mind that the exact same command must be used to start each query router. Failure to do so will result in an error.

Add Shards to the Cluster


Now that we have our configuration servers and query routers configured, we can begin adding the actual shard servers to our cluster. These shards will each hold a portion of the total data.

Log into one of your shard servers as root.

To actually add the shards to the cluster, we will need to go through the query routers, which are now configured to act as our interface with the cluster. We can do this by connecting to any of the query routers like this:

mongo --host query0.agile.vn --port 27017

This will connect to the appropriate query router and open a mongo prompt. We will add all of our shard servers from this prompt.

To add our first shard, type:

sh.addShard( "shard0.agile.vn:27017" )

You can then add your remaining shard droplets in this same interface. You do not need to log into each shard server individually.

sh.addShard( "shard1.agile.vn:27017" )
sh.addShard( "shard2.agile.vn:27017" )

How to Enable Sharding for a Database Collection


MongoDB organizes information into databases. Inside each database, data is further compartmentalized through “collections”. A collection is akin to a table in traditional relational database models.

In this section, we will be operating using the querying routers again. If you are not still connected to the query router, you can access it again using the same mongo command you used in the last section:

mongo --host query0.agile.vn --port 27017

Enable Sharding on the Database Level


We will enable sharding first on the database level. To do this, we will create a test database called (appropriately) test_db.

To create this database, we simply need to change to it. It will be marked as our current database and created dynamically when we first enter data into it:

use test_db

We can check that we are currently using the database we just created by typing:

db
test_db

We can see all of the available databases by typing:

show dbs

You may notice that the database that we just created does not show up. This is because it holds no data so it is not quite real yet.

We can enable sharding on this database by issuing this command:

sh.enableSharding("test_db")

Again, if we enter the show dbs command, we will not see our new database. However, if we switch to the config database which is generated automatically, and issue a find()command, our new database will be returned:

use config
db.databases.find()
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test_db", "partitioned" : true, "primary" : "shard0003" }

Your database will show up with the show dbs command when MongoDB has added some data to the new database.

Enable Sharding on the Collections Level


Now that our database is marked as being available for sharding, we can enable sharding on a specific collection.

At this point, we need to decide on a sharding strategy. Sharding works by organizing data into different categories based on a specific field designated as the shard key in the documents it is storing. It puts all of the documents that have a matching shard key on the same shard.

For instance, if your database is storing employees at a company and your shard key is based on favorite color, MongoDB will put all of the employees with blue in the favorite color field on a single shard. This can lead to disproportional storage if everybody likes a few colors.

A better choice for a shard key would be something that’s guaranteed to be more evenly distributed. For instance, in a large company, a birthday (month and day) field would probably be fairly evenly distributed.

In cases where you’re unsure about how things will be distributed, or there is no appropriate field, you can create a “hashed” shard key based on an existing field. This is what we will be doing for our data.

We can create a collection called test_collection and hash its “_id” field. Make sure we’re using our test_db database and then issue the command:

use test_db
db.test_collection.ensureIndex( { _id : "hashed" } )

We can then shard the collection by issuing this command:

sh.shardCollection("test_db.test_collection", { "_id": "hashed" } )

This will shard the collection across all of the available shards.

Insert Test Data into the Collection


We can see our sharding in action by using a loop to create some objects. This loop comes directly from the MongoDB website for generating test data.

We can insert data into the collection using a simple loop like this:

use test_db
for (var i = 1; i <= 500; i++) db.test_collection.insert( { x : i } )

This will create 500 simple documents ( only an ID field and an “x” field containing a number) and distribute them among the different shards. You can see the results by typing:

db.test_collection.find()
{ "_id" : ObjectId("529d082c488a806798cc30d3"), "x" : 6 }
{ "_id" : ObjectId("529d082c488a806798cc30d0"), "x" : 3 }
{ "_id" : ObjectId("529d082c488a806798cc30d2"), "x" : 5 }
{ "_id" : ObjectId("529d082c488a806798cc30ce"), "x" : 1 }
{ "_id" : ObjectId("529d082c488a806798cc30d6"), "x" : 9 }
{ "_id" : ObjectId("529d082c488a806798cc30d1"), "x" : 4 }
{ "_id" : ObjectId("529d082c488a806798cc30d8"), "x" : 11 }
. . .

To get more values, type:

it
{ "_id" : ObjectId("529d082c488a806798cc30cf"), "x" : 2 }
{ "_id" : ObjectId("529d082c488a806798cc30dd"), "x" : 16 }
{ "_id" : ObjectId("529d082c488a806798cc30d4"), "x" : 7 }
{ "_id" : ObjectId("529d082c488a806798cc30da"), "x" : 13 }
{ "_id" : ObjectId("529d082c488a806798cc30d5"), "x" : 8 }
{ "_id" : ObjectId("529d082c488a806798cc30de"), "x" : 17 }
{ "_id" : ObjectId("529d082c488a806798cc30db"), "x" : 14 }
{ "_id" : ObjectId("529d082c488a806798cc30e1"), "x" : 20 }
. . .

To get information about the specific shards, you can type:

sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 3,
    "minCompatibleVersion" : 3,
    "currentVersion" : 4,
    "clusterId" : ObjectId("529cae0691365bef9308cd75")
}
  shards:
    {  "_id" : "shard0000",  "host" : "162.243.243.156:27017" }
    {  "_id" : "shard0001",  "host" : "162.243.243.155:27017" }
. . .

This will provide information about the chunks that MongoDB distributed between the shards.

Conclusion


By the end of this guide, you should be able to implement your own MongoDB sharding configuration. The specific configuration of your servers and the shard key that you choose for each collection will have a big impact on the performance of your cluster.

Choose the field or fields that have the best distribution properties and most closely represent the logical groupings that will be reflected in your database queries. If MongoDB only has to go to a single shard to retrieve your data, it will return faster.

MongoDB vs Couchbase

Document databases may be the most popular NoSQL database variant of them all. Their great flexibility — schemas can be grown or changed with remarkable ease — makes them suitable for a wide range of applications, and their object nature fits in well with current programming practices. In turn, Couchbase Server and MongoDB have become two of the more popular representatives of open source document databases, though Couchbase Server is a recent arrival among the ranks of document databases.

In this context, the word “document” does not mean a word processing file or a PDF. Rather, a document is a data structure defined as a collection of named fields. JSON (JavaScript Object Notation) is currently the most widely used notation for defining documents within document-oriented databases. JSON’s advantage as an object notation is that, once you comprehend its syntax — and JSON is remarkably easy to grasp — then you have all you need to define what amounts to the schema of a document database. That’s because, in a document database, each document carries its own schema — unlike an RDBMS, in which every row in a given table must have the same columns.

The latest versions of Couchbase Server and MongoDB are both newly arrived. In December 2012, Couchbase released Couchbase Server 2.0, a version that makes Couchbase Server a full-fledged document database. Prior to that release, users could store JSON data into Couchbase, but the database wrote JSON data as a blob. Couchbase was, effectively, a key/value database.

10gen released MongoDB 2.4 just this week. MongoDB has been a document database from the get-go. This latest release incorporates numerous performance and usability enhancements.

Both databases are designed to run on commodity hardware, as well as for horizontal scaling via sharding (in Couchbase, the rough equivalent to a shard is called a partition). Both databases employ JSON as the document definition notation, though in MongoDB, the notation is BSON (Binary JSON), a binary-encoded superset of JSON that defines useful data types not found in JSON. While both databases employ JavaScript as the primary data manipulation language, both provide APIs for all the most popular programming languages to allow applications direct access to database operations.

Key differences
Of course there are differences. First, MongoDB’s handling of documents is better developed. This becomes most obvious in the mongo shell, which serves the dual purpose of providing a management and development window into a MongoDB database. Database, collections, and documents are first-class entities in the shell. Collections are actually properties on database objects.

This is not to say that Couchbase is hobbled. You can easily manage your Couchbase cluster — adding, deleting, and fetching documents — from the Couchbase Management GUI, for which MongoDB has no counterpart. Indeed, if you prefer management via GUI consoles, score one for Couchbase Server. If, however, you favor life at the command line, you will be tipped in MongoDB’s direction.

The cloud-based MongoDB Monitoring Service (MMS), which gathers statistics, is not meant to be a full-blown database management interface. But MongoDB’s environment provides a near seamless connection between the data objects abstracted in the mongo shell and the database entities they model. This is particularly apparent when you discover that MongoDB allows you to create indexes on a specific document field using a single function call, whereas indexes in Couchbase must be created by more complex mapreduce operations.

Test Center Scorecard
25% 20% 20% 15% 10% 10%
Couchbase Server 2.0 9 7 9 9 8 9 8.5

VERY GOOD

25% 20% 20% 15% 10% 10%
MongoDB 2.4 9 9 9 9 8 9 8.9

VERY GOOD

MongoDB Management Service – Install the Monitoring Agent on Debian and Ubuntu

Overview

Installing the MMS monitoring agent on Debian and Ubuntu requires a number of Python packages and extensions, including C extensions.

Prerequisites

Before you install the monitoring agent, these software packages must be available or installed on the target system:

  • Python 2.6+
  • setuptools to install Python packages
  • python-dev to install Python C extensions
  • pip to install and uninstall PyMongo
  • pymongo to install the Python driver used by the monitoring agent
  • agent.py to install the MongoDB monitoring agent

Procedure

1

Install Python Packages and Extensions

Install python-setuptools, which you will use to install the remaining Python dependencies.

sudo apt-get install python-setuptools
2

Install Python C Extensions

While the C extensions are not required for MMS Monitoring, they significantly improve performance. You must have a C compiler (e.g. gcc) and Python header files installed on your system. Type this command to install Python headers:

sudo apt-get install build-essential python-dev
3

Install and Upgrade PyMongo

If you have not installed pymongo, type this command to install the latest version:

sudo easy_install pymongo

To upgrade to the latest version of the driver, type this command:

sudo easy_install -U pymongo

For more information about PyMongo installation, see the Additional Information section below. If PyMongo was previously installed without C extensions, install PyMongo C extensions. If you are installing PyMongo and the Monitoring agent on systems that do not have C compilers, build PyMongo packages with PyMongo C extensions.

4

Install the MongoDB Monitoring Agent

Download the latest MMS monitoring agent from the MongoDB Management Service, located on the Settings page and the Monitoring Agent tab. With Python software requirements installed, install the MongoDB monitoring agent with these commands:

cd mms-agent
nohup python agent.py > ./agent.log 2>&1 &

Replace LOG-DIRECTORY with the path to your MongoDB logs.

How To Install MongoDB on Ubuntu 12.04

MongoDB is a document database used commonly in modern web applications. This tutorial should help you setup a virtual private server to use as a dedicated MongoDB server for a production application environment.

 

Step 1 — Create the Install Script

The MongoDB install process is simple enough to be completed with a Bash script. Copy the following into a new file named `mongo_install.bash` in your home directory:

apt-key adv –keyserver keyserver.ubuntu.com –recv 7F0CEB10

echo “deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen” | tee -a /etc/apt/sources.list.d/10gen.list

apt-get -y update

apt-get -y install mongodb-10gen

 

Here’s an explanation of each line in the script:

The `apt-key` call registers the public key of the custom 10gen MongoDB aptitude repository

A custom 10gen repository list file is created containing the location of the MongoDB binaries

Aptitude is updated so that new packages can be registered locally on the Droplet

Aptitude is told to install MongoDB

TIP: At any time, to change to your home directory, simply execute `cd`

 

Step 2 — Run the Install Script

Execute the following from your home directory:

$ sudo bash ./mongo_install.bash

If everything is successful, you should see the output contain a PID of the newly started MongoDB process:

mongodb start/running, process 2368

 

Step 3 — Check It Out

By default with this install method, MongoDB should start automatically when your Droplet is booted. This means that if you need to reboot your Droplet, MongoDB will start right back up.

To start learning about the running `mongod` process, run the following command:

$ ps aux | grep mongo

 

One line of the output should look like the following:

mongodb 569 0.4 6.4 627676 15936 ? Ssl 22:54 0:02 /usr/bin/mongod –config /etc/mongodb.conf

 

We can see the…

User: `mongodb`

PID: `569`

Command: `/usr/bin/mongod –config /etc/mongodb.conf`

Config File: `/etc/mongodb.conf`