Mesos, Chronos,
and Marathon

Mesos, Chronos, and Marathon

For one of our customers, the traditional method of development, staging, and production was impractical for a variety of reasons. Building a staging infrastructure is expensive. And most times staging is not used. There are tools on Amazon to start and stop servers. Though this is possible, it is not practical and we end up wasting staging servers even when not used.

One solutions was to use Mesos. The main idea, from our perspective, was to democratize the release environment and server usage therof. By democratize we meant that there would be no distinction between development, staging, and production. Every server was utilized. And most important any body in the team – engineers, qa, or dev-ops could release the product. The only change required for production was a few configuration.

As a cautionary note, never leave the configuration information on your source control. We always take the extra precaution to enter this information manually.

We could accomplish this because, we were able to Dockerize our servers. We allocated a few nodes to Mesos to manage our cluster. And for each the applications, we baselined the memory and cpu requirements. This was sufficient for Mesos to allocate resource to any of our docker images. Marathon is the ideal choice for server or long running jobs. It was easy to restart in case of failures.

For our jobs that need to be scheduled, Chronos is the ideal choice – it is a much improved and sophisticated chrontab. Since we schedule literally thousands of jobs/day, Chronos was the right choice. Since the jobs were of short duration, we made sure that Chronos would schedule jobs every day or so. And when the job started, each instance of a particular type, would be feed parameters everyday which constantly change. Based on the number of parameters that need to be served, we could compute the number of instances that need to be run. Just with this simple parameterization of jobs, we were able to optimize resource usage.

With Mesos implementation, we could be fault-tolerant and there was virtually no wastage. And finally we achieved a single environment for development, staging, and production. All this can be done because of Mesos and container technology. And for environments with large number of jobs to run, Chronos is the way to go. It is very well designed with great UI.

Our blog on Mesos will give the details of implementation