Docker and DevOps: Why it Matters

Unless you have been living under a rock the last year, you have probably heard aboutDocker. Docker describes itself as an open platform for distributed applications for developers and sysadmins. That sounds great, but why does it matter?

Wait, virtualization isn’t new!?

Virtualization technology has existed for more than a decade and in the early days revolutionized how the world managed server environments. The virtualization layer later became the basis for the modern cloud with virtual servers being created and scaled on-demand. Traditionally virtualization software was expensive and came with a lot of overheard. Linux cgroups have existed for a while, but recently linux containers came along and added namespace support to provide isolated environments for applications. Vagrant + LXC + Chef/Puppet/Ansible have been a powerful combination for a while so what does Docker bring to the table?

Virtualization isn’t new and neither are containers, so let’s discuss what makes Docker special.

The cloud made it easy to host complex and distributed applications and their lies the problem. Ten years ago applications looked straight-forward and had few complex dependencies.

Screen Shot 2014-10-21 at 10.35.22 AM

The reality is that application complexity has evolved significantly in the last five years, and even simple services are now extremely complex.

Screen Shot 2014-10-21 at 10.35.28 AM

It has become a best practice to build large distributed applications using independent microservices. The model has changed from monolithic to distributed to now containerized microservices. Every microservice has its dependencies and unique deployment scenarios which makes managing operations even more difficult. The default is not a single stack being deployed to a single server, but rather loosely coupled components deployed to many servers.

Docker makes it easy to deploy any application on any platform.

The need for Docker

It is not just that applications are more complex, but more importantly the development model and culture has evolved. When I started engineering, developers had dedicated servers with their own builds if they were lucky. More often than not your team shared a development server as it was too expensive and cumbersome for every developer to have their environment. The times have changed significantly as the cultural norm nowadays is for every developer to be able to run complex applications off of a virtual machine on their laptop (or a dev server in the cloud). With the cheap on-demand resource provided by cloud environments, it is common to have many application environments dev, QA, production. Docker containers are isolated, but share the same kernel and core operating system files which makes them lightweight and extremely fast. Using Docker to manage containers makes it easier to build distributed systems by allowing applications to run on a single machine or across many virtual machines with ease.

Docker is both a great software project (Docker engine) and a vibrant community (DockerHub). Docker combines a portable, lightweight application runtime and packaging tool and a cloud service for sharing applications and automating workflows.

Docker makes it easy for developers and operations to collaborate

Screen Shot 2014-10-21 at 10.35.35 AM
DevOps professionals appreciate Docker as it makes it extremely easy to manage the deployment of complex distributed applications. Docker also manages to unify the DevOps community whether you are a Chef fan, Puppet enthusiast, or Ansible aficionado. Docker is also supported by the major cloud platforms including Amazon Web Services and Microsoft Azure which means it’s easy to deploy to any platform. Ultimately, Docker provides flexibility and portability so applications can run on-premise on bare metal or in a public or private cloud.

DockerHub provides official language stacks and repos

Screen Shot 2014-10-21 at 10.35.41 AM

The Docker community is built on a mature open source mentality with the corporate backing required to offer a polished experience. There is a vibrant and growing ecosystem brought together on DockerHub. This means official language stacks for the common app platforms so the community has officially supported and quality Docker repos which means wider and higher quality support.

Since Docker is so well supported you see many companies offering support for Docker as a platform with official repos onDockerHub.

Screen Shot 2014-10-21 at 10.35.48 AM

What is Docker?

Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications. Consisting of Docker Engine, a portable, lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly assembled from components and eliminates the friction between development, QA, and production environments. As a result, IT can ship faster and run the same app, unchanged, on laptops, data center VMs, and any cloud.

Why do developers like it?

With Docker, developers can build any app in any language using any toolchain. “Dockerized” apps are completely portable and can run anywhere – colleagues’ OS X and Windows laptops, QA servers running Ubuntu in the cloud, and production data center VMs running Red Hat.

Developers can get going quickly by starting with one of the 13,000+ apps available on Docker Hub. Docker manages and tracks changes and dependencies, making it easier for sysadmins to understand how the apps that developers build work. And with Docker Hub, developers can automate their build pipeline and share artifacts with collaborators through public or private repositories.

Docker helps developers build and ship higher-quality applications, faster.

Why do sysadmins like it?

Sysadmins use Docker to provide standardized environments for their development, QA, and production teams, reducing “works on my machine” finger-pointing. By “Dockerizing” the app platform and its dependencies, sysadmins abstract away differences in OS distributions and underlying infrastructure.

In addition, standardizing on the Docker Engine as the unit of deployment gives sysadmins flexibility in where workloads run. Whether on-premise bare metal or data center VMs or public clouds, workload deployment is less constrained by infrastructure technology and is instead driven by business priorities and policies. Furthermore, the Docker Engine’s lightweight runtime enables rapid scale-up and scale-down in response to changes in demand.

Docker helps sysadmins deploy and run any app on any infrastructure, quickly and reliably.

How is this different from Virtual Machines?

Virtual Machines

Each virtualized application includes not only the application – which may be only 10s of MB – and the necessary binaries and libraries, but also an entire guest operating system – which may weigh 10s of GB.

Docker

The Docker Engine container comprises just the application and its dependencies. It runs as an isolated process in userspace on the host operating system, sharing the kernel with other containers. Thus, it enjoys the resource isolation and allocation benefits of VMs but is much more portable and efficient.

Snappy Ubuntu challenges CoreOS and Project Atomic on lightweight cloud servers

The war for which operating system will be the leading cloud operating system is taking some new twists. Today, the name of the game is who can have the smallest operating system image that supports containers.

snappy.jpg
“Snappy” Core Ubuntu is meant to be a small, but powerful, answer for datacenter and cloud servers.

First, CoreOS combined the idea of containers with a lightweight Linux server. Then, Red Hat started working on Project Atomic, which combined Red Hat Enterprise Linux (RHEL) with Docker. Now, Canonical with Ubuntu Core is entering the field.

In each case, the idea is basically the same: Provide developers and system administrators with a minimal Linux server that supports containers, such as those from Docker. The result is meant to be a very flexible, very secure Linux application platform.

The Canonical Ubuntu take on this idea: “Ubuntu Core is the smallest, leanest Ubuntu ever, perfect for ultra-dense computing in cloud container farms, Docker app deployments or Platform as a Service (PaaS) environments. Core is designed for efficiency and has the smallest runtime footprint with the best security profile in the industry: it’s an engine, chassis and wheels, no luxuries, just what you need for massively parallel systems.”

In an interview, Dustin Kirkland, Canonical’s Ubuntu cloud solutions product manager, explained, “Snappy Ubuntu is for building massive, new systems from micro-services supplied by Docker containers.”

This isn’t a new idea. Kirkland pointed to Netflix as pioneering this style of building out major service architecture. “Instead of using massive IBM zSeries mainframes at the back-end, Netflix use hundreds of servers that are each running some of the service.”

This is indeed exactly what Netflix does 24 hours a day, seven days a week. Behind all of Neflix’s tens of millions of nightly video streams there is no datacenter. Instead, there are a few thousand cloud servers coming up and down to meet demand.

Snappy Ubuntu, according to Kirkland, is meant to support such a flexible, high-speed, high-demand environment. It’s small, efficient, runs only the bare minimum of services, and has a read-only operating system.

As such, Ubuntu Core, and its applications and their Docker containers are designed to be upgraded atomically and rolled back if needed. So with “transactional” or “image-based” systems, when you upgrade a server or app you don’t patch it, you replace it.

“Ubuntu Core builds on the world’s favorite container platform and provides transactional updates with rigorous application isolation,” said Mark Shuttleworth, founder of Ubuntu and Canonical in a statement. “This is the smallest, safest platform for Docker deployment ever.”

Curiously enough, Ubuntu Core didn’t start as a cloud project. Its roots, said Kirkland, actually started with Ubuntu Touch, Canonical’s mobile version of Ubuntu.

There are also other changes in Ubuntu Core that started on Ubuntu Touch. The most significant of these is its new packaging system. Instead of coming with multiple packages, a snappy Ubuntu Core app comes with all the underlying dependent libraries bundled with it.

In a blog posting, Mark Shuttleworth, Canonical and Ubuntu’s founder, explained, “Developers of snappy apps get much more freedom to bundle the exact versions of libraries that they want to use with their apps. It’s much easier to make a snappy package than a traditional Ubuntu package – just bundle up everything you want in one place, and ship it. We use strong application isolation to keep data confidential between apps. If you install a bad app, it only has access to the data you create with that app, not to data from other applications.”

Snappy applications are also confined, according to Canonical, by Canonical’s AppArmor kernel security system. In snappy Ubuntu versions, applications are completely isolated from one another.

Want to try it? Ubuntu Core alpha is available today, on, believe it or not, the Microsoft Azure cloud.

“Microsoft loves Linux, and we’re excited to be the first cloud provider to offer a new rendition of one of the most popular Linux platforms in the rapidly growing Azure cloud,” said Bob Kelly, Corporate Vice President at Microsoft, in a statement. “By delivering the new cloud-optimized Ubuntu Core image on Azure, we’re extending our first-class support for Linux and enabling freedom of choice so developers everywhere can innovate even faster.”

I know, I know, I find this announcement rather jaw-dropping as well.

But Canonical and Microsoft have started working together. Canonical brought its DevOps tool Juju to Windows Server 2012 and they’re helping Microsoft bring OpenStack to Windows. And, to think that Ubuntu once had bug number one as “Microsoft having a majority market share”!

Things have changed. Kirkland said, “Microsoft has been extremely friendly for the last six months and we have a positive relationship with them now. Microsoft customers are interested in Ubuntu on the cloud.”

Of course, Linux users can also try the snappy Ubuntu Core locally with KVM; a simple download and launch spins up Ubuntu Core as a virtual machine on any contemporary PC. Ubuntu Core is also coming to other hypervisors, OpenStack and other cloud platforms soon.

I think Canonical is on to something very promising here. Mind you, I think Red Hat is as well with Project Atomic. And even though CoreOS is now fighting with Docker, I think they still have a realistic shot at making an impression on the white-hot business of bringing lightweight servers and containerized applications to the datacenter and the cloud.

At this point, however, it’s way too early to predict who will dominate this rapidly fermenting IT market. The one thing I do know for certain is that these styles of atomic, easy to install, update, and secure servers will be playing an important role in the cloud for the rest of the next decade.

(Referenced from http://www.zdnet.com/article/snappy-ubuntu-challenges-coreos-and-project-atomic-on-lightweight-cloud-servers/)

7 alternatives to TestFlight

Over the years, TestFlight became a tool mobile app developers relied on. But parent company Burstly’s recent acquisition (by Apple) and termination of Android support has developers looking for alternatives.

Getting your beta to testers with ease is important. But some platforms, particularly iOS, make the process a hassle.

There are many predictions about Apple’s future plans for TestFlight. But for now we’ll focus on TestFlight alternatives for iOS, Android, and Windows developers.

HockeyApp | [paid]

HockeyApp not only helps you distribute betas, but is a complete beta testing suite.

With HockeyApp, you don’t need extra tools. It’s an all-in-one suite complete with crash reports, analytics, and feedback built in.

What’s better is that HockeyApp is available for the iOS, Android, and Windows platforms. This means easy beta testing for everyone!

HockeyKit | [free]

One thing developers loved about TestFlight was its price. If you’re looking for a free tool and you like what HockeyApp has to offer, look no further than HockeyKit.

HockeyKit is the self-hosted, open source version of HockeyApp. If you have your own server, why not save some cash and go the DIY route?

Don’t know anything about servers? Then choose HockeyApp. The hard work’s done for you.

As you’d expect, HockeyKit has fewer features and is only available for iOS and Android apps. But if you’re looking for something free and simple, this is your tool.

Apphance | [free]

Apphance describes itself as a tool that closes the feedback loop. It works over-the-air like TestFlight but has great extras like in-app feedback and bug reporting.

If you want one tool for beta users to report problems and access to detailed reports on how to fix them, Apphance will help.

Apphance is cross-platform and even works once your app is live. You can finally distribute your iOS, Android, Windows, and Unity betas with one tool.

AppBlade | [paid]

AppBlade is like the other tools but with an added focus on security. You can secure your app beta, encrypt app data, and even completely wipe a tester’s device afterwards.

AppBlade is available for the iOS, Android, and BlackBerry.

Appaloosa | [paid]

Even though Appaloosa has a funny name, it means business. They make it easy to share betas internally or with a private group. Appaloosa targets enterprise app developers, but there’s no reason a small team couldn’t use it.

Appaloosa is also multi-platform and available for iOS, Android, and Windows devices.

Beta Builder | [freemium]

Beta Builder is bare bones compared to TestFlight. As with TestFlight, you can distribute apps over the air but without a fancy interface. Unlike TestFlight, enterprise ad-hoc distribution is free. You can even limit beta testing in-house to prevent security issues.

The bad news is Beta Builder is available only for iOS apps. It also has a Mac app you can buy for an optional donation of $10.00; otherwise it’s free.

Google Play Native App Beta Testing | [free]

Android developers shouldn’t forget about the tool they already have.

If you’re developing an Android-only app, it may be easier to stick with the Google Play Console. The slew of built-in features and access to a pool of testers via G+ communities make beta testing simple.


With all the alternatives available choose something you’ll love for a while. Maybe Apple will improve TestFlight, bring it back to Android (doubt it), or drop it altogether. Only time will tell.

Since we can’t predict the future, consider making one of the above options your favorite app beta tool.

10 Online IDEs for Active Web Developers

With the growing popularity of cloud based apps, there’s no wonder that developers are turning their attention to this technology. For that reason, cloud IDEs or Online IDEs are gaining more and more followers and users. Some web developers love their installed coding editor comfort, while others are always on the go and don’t want to carry the laptop with them. This when online IDEs are coming into stage, all you need is a smartphone/tablet and an Internet connection. That’s the beauty of these apps, wherever you are, you can code anything from Apache, Ruby, Python, C, C#, HTML, PHP, CSS and others, without installing anything.

The number of online IDEs is growing taking, programing and coding to the next level, but here you will find only the best 10 Online IDEs available right now. Take your time to browse all the features they have, and we are positive that you’ll find the one you need.

1. Codebox

Powerful, Collaborative Online/Offline web IDE

Codebox, is an open source Cloud IDE, than can run not only on the cloud, but on your desktop or your server as well. Codebox.io lets you host and manage Codebox instances online as a service using a dashboard or an API.

2. Compilr

Online Editor & Sandbox

Compilr is an online editor & sandbox that lets you write your code all from the comfort of your browser. Compilr has been tested across all modern desktop and mobile web browsers like Firefox, Chrome, Safari, and Internet Explorer. With support for touchscreen interaction you could write code all from your mobile or tablet device.

3. Cloud9 IDE

 online development environment

The Cloud9 environment is full of features aimed at the fat part of the dynamic Web app development world: Ruby, PHP, and Node.js stacks. The Cloud9 IDE offers a classic file browser and editor for your projects that can be debugged on their server and deployed anywhere you like. The editor is quite powerful, offering code highlighting and error detection. Syntax errors are flagged immediately in your browser before they’re even saved to the server, speeding up the debugging cycle. This won’t catch runtime errors, though. You’ll need to insert console logging methods.

4. Codeanywhere

code editor in a browser with an integrated ftp client

Codeanywhere is a code editor in a browser with an integrated ftp client, and all popular web formats are supported (HTML, PHP, JavaScript, CSS, and XML). Plus, SFTP support is provided, just to cover all bases. As a platform, CodeAnywhere is really lightweight. It’s got an interface that (if anything) will make you think of a desktop app, and it works on all browsers. Besides, you can always download the provided native app for your iPhone or Android.

5. Codenvy

Code, build, test, deploy and share projects

The Codenvy IDE is another incredibly rich code development environment with deep connections to a number of hosting platforms like Amazon’s Elastic Beanstalk and AppFog. There are at least a half-dozen options, and they light up depending upon which type of application you want to create. The IDE offers three types of Java projects (library, WAR, and Spring) and the classic dynamic languages such as Ruby, Python, and PHP.

6. Koding

Social development in your browser

The Koding platform is easy to use. There are a few editors to choose from, and more are on the way. The Kodepad editor is colour coded in a way that makes it easy to recognize what is what. The ability to preview your work with just the click of the mouse is nice. You can do everything right there. No need to upload everything and then realize you need to make changes that require more downloading and uploading of your files. You store everything in your own subdomain on their development server. You even set the permissions for your files.

7. Codio

build and deploy great HTML5, CSS & Javascript applications

Codio is the world’s first web-based front-end IDE that addresses all aspects of the HTML5 development lifecycle from prototype to deployment. Codio is a development platform with the features and power of a desktop IDE but with the simplicity and usability that people have come to expect from modern browser based applications.

8. Shiftedit

Develop from the comfort of your browser

ShiftEdit is an online PHP, Ruby, Java, HTML, CSS and JavaScript editor with built-in (S)FTP and Dropbox. Ideal for web development.

9. SourceKit

lightweight programmer's text editor right inside of Chrome

SourceKit is a Textmate like lightweight programmer’s text editor right inside of Chrome. It saves files directly to Dropbox, so if you have the Dropbox sync software installed, the changes will appear locally as if you did so with a text editor! Changes will be stored remotely so naturally this same extension will pull up the same copy of the file everywhere!

10. Nitrous

Create a Python  Development Environment in 60 seconds.

Nitrous gets your project up and running lightning fast. The online IDE includes not just a file browser and text editor, but a fully-functioning shell – you can even code in vim or emacs if you prefer – all from within the browser.

Conclusion

Either you like it or not; online IDEs are more powerful it used to be that’s because they remove the barriers of coding. Now you can code almost anything in your browser, and the web developers are taking advantage from this.

What is PaaS?

Platform as a Service, often simply referred to as PaaS, is a category of cloud computing that provides a platform and environment to allow developers to build applications and services over the internet. PaaS services are hosted in the cloud and accessed by users simply via their web browser.

Platform as a Service allows users to create software applications using tools supplied by the provider. PaaS services can consist of preconfigured features that customers can subscribe to; they can choose to include the features that meet their requirements while discarding those that do not. Consequently, packages can vary from offering simple point-and-click frameworks where no client side hosting expertise is required to supplying the infrastructure options for advanced development.

The infrastructure and applications are managed for customers and support is available. Services are constantly updated, with existing features upgraded and additional features added. PaaS providers can assist developers from the conception of their original ideas to the creation of applications, and through to testing and deployment. This is all achieved in a managed mechanism.

As with most cloud offerings, PaaS services are generally paid for on a subscription basis with clients ultimately paying just for what they use. Clients also benefit from the economies of scale that arise from the sharing of the underlying physical infrastructure between users, and that results in lower costs.

Below are some of the features that can be included with a PaaS offering:

  • Operating system
  • Server-side scripting environment
  • Database management system
  • Server Software
  • Support
  • Storage
  • Network access
  • Tools for design and development
  • Hosting

Software developers, web developers and businesses can benefit from PaaS. Whether building an application which they are planning to offer over the internet or software to be sold out of the box, software developers may take advantage of a PaaS solution. For example, web developers can use individual PaaS environments at every stage of the process to develop, test and ultimately host their websites. However, businesses that are developing their own internal software can also utilise Platform as a Service, particularly to create distinct ring-fenced development and testing environments.

Below are some of the benefits of PaaS to application developers:

  • They don’t have to invest in physical infrastructure; being able to ‘rent’ virtual infrastructure has both cost benefits and practical benefits. They don’t need to purchase hardware themselves or employ the expertise to manage it. This leaves them free to focus on the development of applications. What’s more, clients will only need to rent the resources they need rather than invest in fixed, unused and therefore wasted capacity.
  • Makes development possible for ‘non-experts’; with some PaaS offerings anyone can develop an application. They can simply do this through their web browser utilising one-click functionality. Salient examples of this are one-click blog software installs such as WordPress.
  • Flexibility; customers can have control over the tools that are installed within their platforms and can create a platform that suits their specific requirements. They can ‘pick and choose’ the features they feel are necessary.
  • Adaptability; Features can be changed if circumstances dictate that they should.
  • Teams in various locations can work together; as an internet connection and web browser are all that is required, developers spread across several locations can work together on the same application build.
  • Security; security is provided, including data security and backup and recovery.

In summary, a PaaS offering supplies an operating environment for developing applications. In other words, it provides the architecture as well as the overall infrastructure to support application development. This includes networking, storage, software support and management services. It is therefore ideal for the development of new applications that are intended for the web as well as mobile devices and PCs.

What is SaaS?

SaaS, or Software as a Service, describes any cloud service where consumers are able to access software applications over the internet. The applications are hosted in “the cloud” and can be used for a wide range of tasks for both individuals and organisations. Google, Twitter, Facebook and Flickr are all examples of SaaS, with users able to access the services via any internet enabled device. Enterprise users are able to use applications for a range of needs, including accounting and invoicing, tracking sales, planning, performance monitoring and communications (including webmail and instant messaging).

SaaS is often referred to as software-on-demand and utilising it is akin to renting software rather than buying it. With traditional software applications you would purchase the software upfront as a package and then install it onto your computer. The software’s licence may also limit the number of users and/or devices where the software can be deployed. Software as a Service users, however, subscribe to the software rather than purchase it, usually on a monthly basis. Applications are purchased and used online with files saved in the cloud rather than on individual computers.

There are a number of reasons why SaaS is beneficial to organisations and personal users alike:

  • No additional hardware costs; the processing power required to run the applications is supplied by the cloud provider.
  • No initial setup costs; applications are ready to use once the user subscribes.
  • Pay for what you use; if a piece of software is only needed for a limited period then it is only paid for over that period and subscriptions can usually be halted at any time.
  • Usage is scalable; if a user decides they need more storage or additional services, for example, then they can access these on demand without needing to install new software or hardware.
  • Updates are automated; whenever there is an update it is available online to existing customers, often free of charge. No new software will be required as it often is with other types of applications and the updates will usually be deployed automatically by the cloud provider.
  • Cross device compatibility; SaaS applications can be accessed via any internet enabled device, which makes it ideal for those who use a number of different devices, such as internet enabled phones and tablets, and those who don’t always use the same computer.
  • Accessible from any location; rather than being restricted to installations on individual computers, an application can be accessed from anywhere with an internet enabled device.
  • Applications can be customised and whitelabelled; with some software, customisation is available meaning it can be altered to suit the needs and branding of a particular customer.

Office software is the best example of businesses utilising SaaS. Tasks related to accounting, invoicing, sales and planning can all be performed through Software as a Service. Businesses may wish to use one piece of software that performs all of these tasks or several that each perform different tasks. The required software can be subscribed to via the internet and then accessed online via any computer in the office using a username and password. If needs change they can easily switch to software that better meets their requirements. Everyone who needs access to a particular piece of software can be set up as a user, whether it is one or two people or every employee in a corporation that employs hundreds.

 Summary

  • There are no setup costs with SaaS, as there often are with other applications
  • SaaS is scalable with upgrades available on demand
  • Access to Software as a Service is compatible across all internet enabled devices
  • As long as there is an internet connection, applications are accessible from any location

Install ClusterControl on Top of Existing MongoDB Sharded Cluster

In this post, we are going to show you on how to install and integrate ClusterControl on top of an existing MongoDB Sharded Cluster with a replica set of 3 nodes

 

MongoDB Sharded Cluster Setup

 

In a sharded cluster, we need to have three types of server:

  • config server (configsvr) – holds metadata of the cluster (minimum 3 servers)
  • shard server (shardsvr) – container that holds subset of data, including replica set (minimum 2 servers)
  • routing server (mongos) – route operations from applications and clients to the shardsvr instances (minimum 1 server)

 

The following sequence explains query routing in a sharded cluster:

  1. Application sends a write query to one of mongos (port 27017)
  2. mongos connects to configsvr (port 27018) to determine the primary shardsvr
  3. mongos then connects to a primary shardsvr (port 27019) to write the data
  4. Data partitioning (sharding) and replication will be automatically handled by shardsvr instance

 

In our setup, we have 3 servers running CentOS 6.3 64bit. On each server, we have colocated a configsvr, shardsvr and mongos. Each server has 3 MongoDB configuration files:

  • /etc/mongodb.config.conf – configsvr configuration
  • /etc/mongodb.shard.conf – shardsvr and replSet configuration
  • /etc/mongos.conf – mongos configuration

 

Our MongoDB dbpath is located at /var/lib/mongodb, while configdb is located at/var/lib/mongodb/configdb and all MongoDB logs generated under /var/log/mongodb directory.

We started all MongoDB instances using following commands in each server:

$ mongod -f /etc/mongodb.config.conf
$ mongod -f /etc/mongodb.shard.conf
$ mongos -f /etc/mongos.conf

 

 

Install ClusterControl Server

We will need a separate server to run ClusterControl, as illustrated below:

1. SSH into ClusterControl server and make sure that you have IPtables and SElinux turned off:

$ service iptables stop
$ setenforce 0
$ sed -i.bak 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config

2. It is highly recommended to enable passwordless SSH with key authentication between ClusterControl and agents. Generate a RSA key and copy it to all nodes:

$ ssh-keygen -t rsa # just press enter for all prompts
$ ssh-copy-id -i ~/.ssh/id_rsa root@192.168.197.40
$ ssh-copy-id -i ~/.ssh/id_rsa root@192.168.197.41
$ ssh-copy-id -i ~/.ssh/id_rsa root@192.168.197.42
$ ssh-copy-id -i ~/.ssh/id_rsa root@192.168.197.43

3. On the ClusterControl server, install Apache, PHP, MySQL and other required components:

$ yum install httpd php php-mysql php-gd mysql-server mysql cronie sudo mailx -y

4. Download ClusterControl for MongoDB and required packages from Severalnines website:

$ wget http://www.severalnines.com/downloads/cmon/cmon-mongodb-controller-1.2.4-1.x86_64.rpm 
$ wget http://www.severalnines.com/downloads/cmon/cmon-mongodb-www-1.2.4-1.noarch.rpm

5. Install ClusterControl web apps and create graph directory:

$ rpm -Uhv cmon-mongodb-www-1.2.4-1.noarch.rpm
$ mkdir /var/www/html/cmon/graph

6. Install the CMON controller:

$ rpm -Uhv cmon-mongodb-controller-1.2.4-1.x86_64.rpm

7. Disable name resolve in MySQL. This will allow us to use IP address only when granting database user. Add following line into /etc/my.cnf under [mysqld] directive:

skip-name-resolve

8. Enable auto-start on boot of MySQL, start MySQL, create CMON database and import the schema for CMON:

$ chkconfig mysqld on
$ service mysqld start
$ mysql -e “CREATE DATABASE cmon”
$ mysql < /usr/share/cmon/cmon_db.sql
$ mysql < /usr/share/cmon/cmon_data.sql

9. Enter MySQL console and allow CMON database users:

> GRANT ALL ON *.* TO 'cmon'@'192.168.197.40' IDENTIFIED BY 'cmonP4ss' WITH GRANT OPTION;
> GRANT ALL ON *.* TO 'cmon'@'127.0.0.1' IDENTIFIED BY 'cmonP4ss' WITH GRANT OPTION;
> GRANT SUPER,INSERT,SELECT,UPDATE,DELETE ON *.* TO 'cmon'@'192.168.197.41' IDENTIFIED BY 'cmonP4ss';
> GRANT SUPER,INSERT,SELECT,UPDATE,DELETE ON *.* TO 'cmon'@'192.168.197.42' IDENTIFIED BY 'cmonP4ss';
> GRANT SUPER,INSERT,SELECT,UPDATE,DELETE ON *.* TO 'cmon'@'192.168.197.43' IDENTIFIED BY 'cmonP4ss';

10. Configure MySQL root password:

$ mysqladmin -u root password ‘MyP4ss’
$ mysqladmin -h127.0.0.1 -u root password ‘MyP4ss’

11. Configure CMON as controller by editing /etc/cmon.cnf:

# CMON config file
## id and name of cluster that this cmon agent is monitoring.
## Must be unique for each monitored cluster, like server-id in mysql
cluster_id=1
name=default_repl_1
mode=controller
type=mongodb

# MySQL for CMON
## Port of mysql server holding cmon database
mysql_port=3306
## Hostname/IP of mysql server holding cmon database
mysql_hostname=192.168.197.40
## Password for 'cmon' user on  the 'mysql_hostname'
mysql_password=cmonP4ss
local_mysql_port=3306
local_mysql_password=cmonP4ss
mysql_basedir=/usr/

# CMON service
## Hostname/IP of the server of this cmon instance
hostname=192.168.197.40
## ouser - the user owning the cmon_core_dir above
osuser=root
os=redhat
## logfile is default to syslog
logfile=/var/log/cmon.log
## Location of cmon.pid file. The pidfile is written in /tmp/ by default
pidfile=/var/run/
nodaemon=0

# MongoDB configdb location
monitored_mountpoints=/var/lib/mongodb/configdb
## All mongodb instances with port (comma separated)
mongodb_server_addresses=192.168.197.41:27018,192.168.197.42:27018,192.168.197.43:27018
mongocfg_server_addresses=192.168.197.41:27019,192.168.197.42:27019,192.168.197.43:27019
mongos_server_addresses=192.168.197.41:27017,192.168.197.42:27017,192.168.197.43:27017
mongodb_basedir=/usr/

# CMON stats options
db_stats_collection_interval=10
host_stats_collection_interval=60
ssh_opts=-nq

 

 

Install ClusterControl Agents

 

ClusterControl agents must reside in all MongoDB nodes. The agents are responsible for the following:

  • Restarting failed processes
  • Collecting host stats (disk/network/CPU/RAM)
  • Reading and parsing log files

 

1. Login to mongo1 via SSH, download and install CMON MongoDB agent:

$ wget http://www.severalnines.com/downloads/cmon/cmon-mongodb-agent-1.2.4-1.x86_64.rpm
$ rpm -Uhv cmon-mongodb-agent-1.2.4-1.x86_64.rpm

 

2. Configure CMON as agent by editing /etc/cmon.cnf:

# CMON config file
## id and name of cluster that this cmon agent is monitoring.
## Must be unique for each monitored cluster, like server-id in mysql
cluster_id=1
name=default_repl_1
mode=agent
type=mongodb

# MySQL for CMON
## Port of mysql server holding cmon database
mysql_port=3306
## Hostname/ip of mysql server holding cmon database
mysql_hostname=192.168.197.40
## Password for 'cmon' user on  the 'mysql_hostname'
mysql_password=cmonP4ss
local_mysql_port=3306
local_mysql_password=cmonP4ss
# CMON service
## Hostname/IP of the server of this cmon instance
hostname=192.168.197.41
## osuser - the user owning the cmon_core_dir above
osuser=root
## logfile is default to syslog
logfile=/var/log/cmon.log
## location of cmon.pid file. The pidfile is written in /tmp/ by default
pidfile=/var/run/
nodaemon=0

# MongoDB config database
monitored_mountpoints=/var/lib/mongodb/configdb

 

3. Repeat above steps for mongo2 and mongo3. Make sure to change the value of “hostname” on the respective nodes.

 

 

Start the Cluster

 

1. We will begin by enabling Apache and CMON on boot, followed by starting Apache and CMON service in ClusterControl server:

$ chkconfig httpd on
$ chkconfig cmon on
$ service httpd start
$ service cmon start

 

2. Next, login to mongo1, mongo2 and mongo3 to start CMON agent service:

$ chkconfig cmon on
$ service cmon start

 

Configure ClusterControl UI

 

1. To install the new ClusterControl UI, SSH into the ClusterControl host, download the ClusterControl installation script, change script permissions and execute it:

$ wget http://www.severalnines.com/downloads/cmon/setup-cc-ui.sh
$ chmod +x setup-cc-ui.sh
$ ./setup-cc-ui.sh

 

2. To finalize the UI installation, open web browser and go to this URL, http://ClusterControl_IP_address/install and you should see “Install ClusterControl UI and API” page.

 

**Please note the ClusterControl API Access Token, ClusterControl API URL, your login email and login password. We will use these later on the cluster registration page.

 

3. After the installation, click “Click here to automatically register your cluster now!” and you will redirected to cmonapi page similar to screenshot below. Click “Login Now”.

 

 

4. After that, you need to login using the email address you specified in the installation page with respective password. (default is “admin”). You should see the “Cluster Registrations” page similar to screenshot below. Enter the ClusterControl API token and URL:

 

5. You will be redirected to ClusterControl UI located at http://ClusterControl_IP_address/clustercontrol, your MongoDB Cluster is listed on this page. Click on it to view your cluster:

 

 

You’re done! You are now able to manage your MongoDB sharded cluster using ClusterControl!

MongoDb Architecture

NOSQL has become a very heated topic for large web-scale deployment where scalability and semi-structured data driven the DB requirement towards NOSQL. There has been many NOSQL products evolving in over last couple years. In my past blogs, I have been covering the underlying distributed system theory of NOSQL, as well as some specific products such as CouchDB and Cassandra/HBase.

Last Friday I was very lucky to meet with Jared Rosoff from 10gen in a technical conference and have a discussion about the technical architecture of MongoDb. I found the information is very useful and want to share with more people.

One thing I am very impressed by MongoDb is that it is extremely easy to use and the underlying architecture is also very easy to understand.

Here are some simple admin steps to start/stop MongoDb server

# Install MongoDB
mkdir /data/lib

# Start Mongod server
.../bin/mongod # data stored in /data/db

# Start the command shell
.../bin/mongo
> show dbs
> show collections

# Remove collection
> db.person.drop()

# Stop the Mongod server from shell
> use admin
> db.shutdownServer()

Major difference from RDBMS
MongoDb differs from RDBMS in the following way

  • Unlike an RDBMS record which is “flat” (a fixed number of simple data type), the basic unit of MongoDb is “document” which is “nested” and can contain multi-value fields (arrays, hash).
  • Unlike RDBMS where all records stored in a table must be confined to the table schema, documents of any structure can be stored in the same collection.
  • There is no “join” operation in the query. Overall, data is encouraged to be organized in a more denormalized manner and the more burden of ensuring data consistency is pushed to the application developers
  • There is no concept of “transaction” in MongoDb. “Atomicity” is guaranteed only at the document level (no partial update of a document will occurred).
  • There is no concept of “isolation”, any data read by one client may have its value modified by another concurrent client.

By removing some of those features that a classical RDBMS will provide, MongoDb can be more light-weight and be more scalable in processing big data.
Query processingMongoDb belongs to the type of document-oriented DB. In this model, data is organized as JSON document, and store into a collection. Collection can be thought for equivalent to Table and Document is equivalent to records in RDBMS world.

Here are some basic example.

# create a doc and save into a collection
> p = {firstname:"Dave", lastname:"Ho"}
> db.person.save(p)
> db.person.insert({firstname:"Ricky", lastname:"Ho"})

# Show all docs within a collection
> db.person.find()

# Iterate result using cursor
> var c = db.person.find()
> p1 = c.next()
> p2 = c.next()

To specify the search criteria, an example document containing the fields that needs to match against need to be provided.

> p3 = db.person.findone({lastname:"Ho"})

Notice that in the query, the value portion need to be determined before the query is made (in other words, it cannot be based on other attributes of the document). For example, lets say if we have a collection of “Person”, it is not possible to express a query that return person whose weight is larger than 10 times of their height.

# Return a subset of fields (ie: projection)
> db.person.find({lastname:"Ho"}, {firstname:true})

# Delete some records
> db.person.remove({firstname:"Ricky"})

To speed up the query, index can be used. In MongoDb, index is stored as a BTree structure (so range query is automatically supported). Since the document itself is a tree, the index can be specified as a path and drill into deep nesting level inside the document.

# To build an index for a collection
> db.person.ensureIndex({firstname:1})

# To show all existing indexes
> db.person.getIndexes()

# To remove an index
> db.person.dropIndex({firstname:1})

# Index can be build on a path of the doc.
> db.person.ensureIndex({"address.city":1})

# A composite key can be used to build index
> db.person.ensureIndex({lastname:1, firstname:1})

Index can also be build on an multi-valued attribute such as an array. In this case, each element in the array will have a separate node in the BTree.

Building an index can be done in both offline foreground mode or online background mode. Foreground mode will proceed much faster but the DB cannot be access during the build index period. If the system is running in a replica set (describe below), it is recommended to rotate each member DB offline and build the index in foreground.

When there are multiple selection criteria in a query, MongoDb attempts to use one single best index to select a candidate set and then sequentially iterate through them to evaluate other criteria.

When there are multiple indexes available for a collection. When handling a query the first time, MongoDb will create multiple execution plans (one for each available index) and let them take turns (within certain number of ticks) to execute until the fastest plan finishes. The result of the fastest executor will be returned and the system remembers the corresponding index used by the fastest executor. Subsequent query will use the remembered index until certain number of updates has happened in the collection, then the system repeats the process to figure out what is the best index at that time.

Since only one index will be used, it is important to look at the search or sorting criteria of the query and build additional composite index to match the query better. Maintaining an index is not without cost as index need to be updated when docs are created, deleted and updated, which incurs overhead to the update operations. To maintain an optimal balance, we need to periodically measure the effectiveness of having an index (e.g. the read/write ratio) and delete less efficient indexes.

Storage Model
Written in C++, MongoDB uses a memory map file that directly map an on-disk data file to in-memory byte array where data access logic is implemented using pointer arithmetic. Each document collection is stored in one namespace file (which contains metadata information) as well as multiple extent data files (with an exponentially/doubling increasing size).


The data structure uses doubly-linked-list extensively. Each collection of data is organized in a linked list of extents each of which represents a contiguous disk space. Each extent points to a head/tail of another linked list of docs. Each doc contains a linked list to other documents as well as the actual data encoded in BSON format.

Data modification happens in place. In case the modification increases the size of record beyond its originally allocated space, the whole record will be moved to a bigger region with some extra padding bytes. The padding bytes is used as an growth buffer so that future expansion doesn’t necessary require moving the data again. The amount of padding is dynamically adjusted per collection based on its modification statistics. On the other hand, the space occupied by the original doc will be free up. This is kept tracked by a list of free list of different size.

As we can imagine holes will be created over time as objects are created, deleted or modified, this fragmentation will hurt performance as less data is being read/write per disk I/O. Therefore, we need to run the “compact” command periodically, which copy the data to a contiguous space. This “compact%2

MongoDB vs CouchDB: Open Source NoSQL and Document Databases Comparison

MongoDB and CouchDB are both document-oriented databases. MongoDB and CouchDB are the best examples of open source NoSQL database. Aside from both storing documents though, it turns out that they don’t share much in common. There are a lot of difference between MongoDB and CouchDB in terms of implementation of their data-model, interface, object storage, replication methods etc.

As we say, MongoDB and CouchDB are both document-oriented free and open source databases, the word “document” does not mean a word processing file or a PDF. Rather, a document is a data structure defined as a collection of named fields. JSON (JavaScript Object Notation) is currently the most widely used notation for defining documents within document-oriented databases. JSON’s advantage as an object notation is that, once you comprehend its syntax — and JSON is remarkably easy to grasp — then you have all you need to define what amounts to the schema of a document database. That’s because, in a document database, each document carries its own schema — unlike an RDBMS, in which every row in a given table must have the same columns.

In this article “MongoDB vs CouchDB”, I have tried to make a comparison between MongoDB and CouchDB and figured out following difference between these two Free, Open Source, NoSQL document oriented databases:

Data Model: MongoDB and CouchDB are both document oriented databases. MongoDB is JSON based while CouchDB is BSON based.

Interface: MongoDB uses custom protocol over TCP/IP while CouchDB uses HTTP/REST protocol.

Object Storage: MongoDB database contains collections and those collections contain documents while CouchDB directly contains all the documents.

Implementation Language: MongoDB is written in C++ while CouchDB is written in Erlang.

Server operating systems: MongoDB can operate on Linux, OS X, Solaris and Windows platforms while CouchDB can operate on Android, BSD, Linux, OS X, Solaris and Windows.

Supported programming languages: MongoDB supports a lot of programming languages like Actionscript, C, C#, C++, Clojure, ColdFusion, D, Dart, Delphi, Erlang, Go, Groovy, Haskell, Java, JavaScript, Lisp, Lua, MatLab, Perl, PHP, PowerShell, Prolog, Python, R, Ruby, Scala and Smalltalk. CouchDB supports lesser programming languages as compared to MongoDB. Programming languages supported by CouchDB are C, C#, ColdFusion, Erlang, Haskell, Java, JavaScript, Lisp, Lua, Objective-C, OCaml, Perl, PHP, PL/SQL, Python, Ruby and Smalltalk.

Replication: MongoDB supports only Master-Slave replication. On the other hand, CouchDB supports Master-Master Replication as well as Master-Slave Replication.

Triggers: MongoDB does not support triggers while CouchDB does.

Developer: MongoDB is developed by MongoDB, Inc. MongoDB was initially released in 2009. CouchDB is developed by Apache Software Foundation. CouchDB was initially release in 2005.