Starting out with xDB and getting to know MongoDB

Recently I’ve had to get a few developers up to speed on the power of Sitecore 7.5/8 Experience Platform, specifically on the power of the analytics platform introduced in 7.5/8. As most people familiar with Sitecore 7.5/8 already know, 7.5 introduced MongoDB in the technology stack of Sitecore, to harness its superb transactional writing capabilities. This is a perfect match for what Sitecore will use it for, which is to track user analytics and other similar data, which were all previously tracked in Sql Server Analytics Database.

MongoDB may be a different beast than what most programmers in the .net/microsoft technology stack are used to, but they work perfectly well together, and in order to work with Sitecore 7.5+, we need to get a familiar with it. So, I’ve had a few repeated intro conversation with a few of the developers of my team regarding this topic, and I thought this would be a good time to write a simple primer. We also have a need to architect a enterprise level MongoDB cluster soon, so this primer is way overdue. I’ll be covering that in a later post.

MongoDB, as you may already know, is an open-source document database, and the leading NoSQL database around. NoSQL databases take into account the shortcomings for traditional RDBMS (Relational Database Management Systems), especially the OLTP (Online transaction processing) operations. Because MongoDB is a document based database, it has phenomenal read/write speeds, and is perfect for application that have a lot of read/writes.

Getting Started

In order to develop with Sitecore 7.5+, it would be necessary to have MongoDB installed locally in the development environment. The first obvious suggestion to getting started is to of course read the docs over at http://docs.mongodb.org/manual/. You’ll get a good idea of what MongoDB entails.

Production Infrastructure for xDB

This topic originally came up because we had to recently design/architect a production MongoDB environment for a client who wants to upgrade to Sitecore 8. Hosting on a cloud was not an option for legal reasons, so we have to setup a private cloud. We are also looking into seeing if MMS can help manage the private cloud – a blog post to come later. In order to design an infrastructure for MongoDB, we need to understand the requirements. Per John West, the MongoDB requirements for Sitecore xDB are:

  • Quad-core processor(s)
  • Minimum 16GB RAM
  • 500GB of storage spread over at least two solid-state drives
  • Only use SSDs
  • As much RAM as the environment can use for maximally effective indexing
  • If the active working set reaches 60% of RAM, add shards and/or RAM
  • Never use remote file shares
  • Use 64-bit
  • Avoid NUMA disk allocation strategies
  • If you use Mongo for session, use a separate Mongo instance than that used for xDB

For a production environment, what we also need is Redundancy and Scalability. We would need to add Shards for better concurrency and large data sets. The recommendations for a Production Cluster from MongoDB is as follows:

Production Cluster Architecture

You can read the roles of each of the components here: http://docs.mongodb.org/manual/core/sharded-cluster-config-servers/#sharding-config-server

This obviously requires significant HW, and if you are able to host in the cloud, it could be more cost effective. A few thoughts on cost implications:

Setting Up MongoDB for xDB for local development

The general idea of how to set up MongoDB is available in the docs, but I thought I would extract the steps and document them separately for setup in windows, and specific to Sitecore. To mimic the production shard cluster, we would want to have similar but non-redundant components:

shard-test

Fellow Sitecore enthusiast Dave Leigh has a nice article on how to setup a MongoDB replicated set without a cluster, and I wanted to expand on that to setup a cluster.Its pretty simple to setup the above test cluster on windows or linux, but I’m going to install on windows, because my local development machine is windows. Its important to note that you don’t need to setup a cluster in the development environment – a direct connection to your mongod instance is good enough. But mimicing the production environment has its advantages, and in case you want to set this up, here is the primer.

You also don’t need three separate machines for this – if you have them available, great, but if not, thats fine, too – you can set this up all on the same machine.

Download and Install MongoDB

Download from here:https://www.mongodb.org/downloads. Get the 64 bit version. Run the install – MongoDB is self contained, so whatever directory you install is where all the files are. For this example, I am going to install in c:\mongodb.

Make all the directories and the host entries

Create the directory for the data: c:\mongo-data-standalone
Create the directory for the meta data for the config: c:\mongo-metadata-standalone

Its good practice to make DNS entries to point to each component so that any ip changes can be handled via DNS. For my example, I made three to point all locally. If you are using three different machines, set the appropriate IPs, and make the host entries on all the machines

127.0.0.1      router1.mongodb-test
127.0.0.1      config1.mongodb-test
127.0.0.1      shard1.mongodb-test

Set up the config server

We first need to start the config server process, by setting the –configsvr switch for mongod process, and pointing to the metadata folder. You can also choose the port you want.

mongod --configsvr --dbpath c:\mongodb-metadata-standalone --port 27019

Set up the router server

Start the mongos process on the routing server, by pointing to the config server. You can choose the port here, as well.

mongos --configdb config1.mongodb-test:27019 --port 27011

Set up the data shard

The shard needs another instance of mongod; If you are installing on the same machine and you can start that on a different port so it doesn’t conflict with the router instance.

mongod --dbpath C:\mongodb-data-standalone --port 27018

Once the process has started, you can then connect to mongos instance, and get to the console:

mongo --host router1.mongodb-test --port 27011

router1

Once you get to the console, you can add the shard:

sh.addShard("shard1.mongodb-test:27018")

router2

This is enough to start collecting data. Change your connection strings in Sitecore to connect to the router:

  <add name="analytics" connectionString="mongodb://router1.mongodb-test:27011/analytics" />
  <add name="tracking.live" connectionString="mongodb://router1.mongodb-test:27011/tracking_live" />
  <add name="tracking.history" connectionString="mongodb://router1.mongodb-test:27011/tracking_history" />
  <add name="tracking.contact" connectionString="mongodb://router1.mongodb-test:27011/tracking_contact" />

Once you start up Sitecore, you should see all the analytics databases added.

router3

At this point, you can continue with your Sitecore development. If you want to setup a replicated set, and test the failover, follow the article that Dave wrote, and when setting up the data shard, make three separate folders for the replicated set, and start three different monogd process for each member, on three different ports.

Resources

MongoDB has been out for long while, but since I only started to play with it recently for Sitecore, I got a lot of help from the web and other Sitecore/MongoDB folks. Here is a list of links that helped me:

Advertisement

One thought on “Starting out with xDB and getting to know MongoDB

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s