All Things Open, October 2016
clouds, containers, infrastructure,
applications and their management
(Stay tuned for updates to presentation and book)
One size does not fit all.
A strict apples-to-apples comparison is inappropriate and not the objective, hence characterizing and contrasting.
Let's not go here today.
Container orchestrators may be intermixed.
Host Health Monitoring
Orchestrator Updates and Host Maintenance
Networking and Load-Balancing
Application Health Monitoring
Application Performance Monitoring
Current release v0.4
v0.5 to be in a week or so
Nomad Enterprise offering aimed for Q1-Q2 next year.
Supported and governed by HashiCorp
Hashiconf US '15 had ~300 attendees
Hashiconf EU '16 had ~320 attendees
HashiConf US '16 had ~ 500 attendees
Nomad is a single binary, both for clients and servers, and requires no external services for coordination or storage.
Gossip protocol - Serf is used
Docker multi-host networking and Swarmkit use Serf, too
Servers advertise full set of Nomad servers to clients via heartbeats every 30 seconds
Creating federated clusters is simple
Nomad integrates with Consul to provide service discovery and monitoring.
enabling all servers to participate in scheduling decisions which increases the total throughput and reduces latency
three scheduler types used when creating jobs:
service, batch and system
nomad plan point-in-time-view of what Nomad will do
Used by Nomad clients to execute a task and provide resource isolation.
By having extensible task drivers are important for flexibility to support a broad set of workloads.
Does not currently support pluggable task drivers,
Have to iImplement task driver interface and compile Nomad binary.
Drain allocations on a running node.
integrates with tools like Packer, Consul, and Terraform to support building artifacts, service discovery, monitoring and capacity management.
Log rotation (stderr and stdout)
no log forward support, yet
Rolling updates (via the `update` block in the job specification).
currently http, tcp and script
In the future Nomad will add support for more Consul checks.
nomad alloc-status reports actual resource utilization
Dynamic ports are allocated in a range from 20000 to 60000.
Shared IP address with Node
Consul provides DNS-based load-balancing
distributed and highly available, using both leader election and state replication to provide availability in the face of failures.
Built for managing multiple clusters / cluster federation.
Swarmkit or Swarm mode
Swarm is simple and easy to setup.
Responsible for the clustering and scheduling aspects of orchestration.
Originally an imperative system, now declarative
Swarm’s architecture is not complex as those of Kubernetes and Mesos
Written in Go, Swarm is lightweight, modular and extensible
Docker Swarm 1.11 (Standalone)
Docker Swarm Mode 1.12 (Swarmkit)
Standalone: ~3,000 commits, 12 core maintainers (140 contributors)
Swarmkit: ~2,000 commits, 12 core maintainers (40 contributors)
~250 Docker meetups worldwide
Standalone announced ~12 months ago (Nov 2015)
Swarmkit announced ~3 month ago (July 2016)
used in the formation of clusters by the Manager to discover for Nodes (hosts).
Like Nomad, uses Hashicorp's goMemDB for storing cluster state
Pull model - where worker checks-in with the Manager
Rate Control - of checks-in with Manager may be controlled at Manager - add jitter
Workers don't need to know which Manager is active; Follower Managers will redirect Workers to Leader
Embedded DNS and round robin load-balancing
Services are a new concept
Swarm’s scheduler is pluggable
Swarm scheduling is a combination of strategies and filters/constraint:
container constraints (affinity, dependency, port) are defined as environment variables in the specification file
node constraints (health, constraint) must be specified when starting the docker daemon and define which nodes a container may be scheduled on.
Swarm Mode only supports Spread
Ability to remove batteries is a strength for Swarm:
Pluggable network driver
Pluggable distributed K/V store
Docker container engine runtime-only
Pluggable authorization (in docker engine)*
Nodes may be Active, Drained and Paused
Manager weights are used to drain or pause Managers
Manual swarm manager and worker updates
Rolling updates now supported
Swarm monitors the availability and resource usage of nodes within the cluster
Swarm and Docker’s multi-host networking are simpatico
provides for user-defined overlay networks that are micro-segmentable
uses a gossip protocol for quick convergence of neighbor table
facilitates container name resolution via embedded DNS server (previously via etc/hosts)
You may bring your own network driver
Load-balancing based on IPVS
expose Service's port externally
L4 load-balancer; cluster-wide port publishing
send a request to any one of the nodes and it will be routed automatically
send a request to any one of the nodes and it will be internally load balanced
tracking toward 1.13
Managers may be deployed in a highly-available configuration
Active/Standby - only one active Leader at-a-time
Maintain odd number of managers
Rescheduling upon node failure
No rebalancing upon node addition to the cluster
Does not support multiple failure isolation regions or federation
although, with caveats, federation is possible.
Scaling swarm to 1,000 AWS nodes and 50,000 containers
Suitable for orchestrating a combination of infrastructure containers
Has only recently added capabilities falling into the application bucket
Swarm is a young project
advanced features forthcoming
natural expectation of caveats in functionality
No rebalancing, autoscaling or monitoring, yet
Only schedules Docker containers, not containers using other specifications.
Does not schedule VMs or non-containerized processes
Does not provide support for batch jobs
Need separate load-balancer for overlapping ingress ports
While dependency and affinity filters are available, Swarm does not provide the ability to enforce scheduling of two containers onto the same host or not at all.
Filters facilitate sidecar pattern. No “pod” concept.
Swarm works. Swarm is simple and easy to deploy.
1.12 eliminated the need for much third-party software
Facilitates earlier stages of adoption by organizations viewing containers as faster VMs
now with built-in functionality for applications
Swarm is easy to extend, if can already know Docker APIs, you can customize Swarm
Still modular, but has stepped back here.
Moving very fast; eliminating gaps quickly.
an opinionated framework for building distributed systems
or as its tagline states "an open source system for automating deployment, scaling, and operations of applications."
Written in Go, Kubernetes is lightweight, modular and extensible
considered a third generation container orchestrator led by Google, Red Hat and others.
bakes in load-balancing, scale, volumes, deployments, secret management and cross-cluster federated services among other features.
Declaratively, opinionated with many key features included
Kubernetes is young (about two years old)
Announced as production-ready 15 months ago (July 2015)
Project currently has over 1,000 commits per month (~38,000 total)
made by about 100 (862 total) Kubernauts (Kubernetes enthusiasts)
~5,000 commits made in 1.3 release (1.4 is latest)
Under the governance of the Cloud Native Computing Foundation
Robust set of documentation and ~90 meetups
by default, the node agent (kubelet) is configured to register itself with the master (API server)
automating the joining of new hosts to the cluster
Two primary modes of finding a Service
SkyDNS is deployed as a cluster add-on
environment variables are used as a simple way of providing compatibility with Docker links-style networking
By default, scheduling is handled by kube-scheduler.
Selection criteria used by kube-scheduler to identify the best-fit node is defined by policy:
Predicates (node resources and characteristics):
PodFitPorts , PodFitsResources, NoDiskConflict , MatchNodeSelector, HostName , ServiceAffinit, LabelsPresence
Priorities (weighted strategies used to identify “best fit” node):
LeastRequestedPriority, BalancedResourceAllocation, ServiceSpreadingPriority, EqualPriority
One of Kubernetes strengths its pluggable architecture and it being an extensible platform
database for service discovery or network driver
users may choose to run Docker with Rocket containers
optional system components that implement a cluster feature (e.g. DNS, logging, etc.)
shipped with the Kubernetes binaries and are considered an inherent part of the Kubernetes clusters
Deployment objects automate deploying and rolling updating applications.
Support for rolling back deployments
Consistently backwards compatible
Upgrading the Kubernetes components and hosts is done via shell script
Host maintenance - mark the node as unschedulable.
existing pods are vacated from the node
prevents new pods from being scheduled on the node
Failures - actively monitors the health of nodes within the cluster
via Node Controller
Resources - usage monitoring leverages a combination of open source components:
cAdvisor, Heapster, InfluxDB, Grafana
three types of user-defined application health-checks and uses the Kubelet agent as the the health check monitor
HTTP Health Checks, Container Exec, TCP Socket
collect logs which persist beyond the lifetime of the pod’s container images or the lifetime of the pod or even cluster
standard output and standard error output of each container can be ingested using a Fluentd agent running on each node
…enter the Pod
atomic unit of scheduling
flat networking with each pod receiving an IP address
no NAT required, port conflicts localized
intra-pod communication via localhost
Services provide inherent load-balancing via kube-proxy:
runs on each node of a Kubernetes cluster
reflects services as defined in the Kubernetes API
supports simple TCP/UDP forwarding and round-robin and Docker-links-based service IP:PORT mapping.
Secrets are used by container in a pod either:
mounted as data volumes
exposed as environment variables
None of the pod’s containers will start until all the pod’s volumes are mounted.
Individual secrets are limited to 1MB in size.
Each master component may be deployed in a highly-available configuration.
Federated clusters / multi-region deployments
v1.2 support for 1,000 node clusters
v1.3 supports 2,000 node clusters
Horizontal Pod Autoscaling (via Replication Controllers).
Cluster Autoscaling (if you're running on GCE with AWS support is coming soon).
Only runs containerized applications
For those familiar with Docker-only, Kubernetes requires understanding of new concepts
Powerful frameworks with more moving pieces beget complicated cluster deployment and management.
Lightweight graphical user interface
Does not provide as sophisticated techniques for resource utilization as Mesos
Kubernetes can schedule docker or rkt containers
Inherently opinionated w/functionality built-in.
relatively easy to change its opinion
little to no third-party software needed
builds in many application-level concepts and services (petsets, jobsets, daemonsets, application packages / charts, etc.)
advanced storage/volume management
project has most momentum
project is arguably most extensible
thorough project documentation
Multi-master, cross-cluster federation, robust logging & metrics aggregation
Mesos is a distributed systems kernel
stitches together many different machines into a logical computer
Mesos has been around the longest (launched in 2009)
and is arguably the most stable, with highest (proven) scale currently
Mesos is written in C++
with Java, Python and C++ APIs
Marathon as a Framework
Marathon is one of a number of frameworks (Chronos and Aurora other examples) that may be run on top of Mesos
Frameworks have a scheduler and executor. Schedulers get resource offers. Executors run tasks.
Marathon is written in Scala
MesosCon 2015 in Seattle had 700 attendees
up from 262 attendees in 2014
Mesos had 78 contributors
Marathon had 217 contributors
Under the governance of Apache Foundation
Mesos is used by Twitter, AirBnb, eBay, Apple, Cisco, Yodle
Marathon is used by Verizon and Samsung
Mesos-DNS generates an SRV record for each Mesos task
including Marathon application instances
Marathon will ensure that all dynamically assigned service ports are unique
Mesos-DNS is particularly useful when:
apps are launched through multiple frameworks (not just Marathon)
you are using an IP-per-container solution like Project Calico
you use random host port assignments in Marathon
Two level scheduler
First level scheduling happens at mesos master based on allocation policy , which decides which framework get resources.
Second level scheduling happens at Framework scheduler , which decides what tasks to execute.
Provide reservations, over-subscriptions and preemption.
may run multiple frameworks
extend inner workings of Mesos by creating and using shared libraries that are loaded on demand
many types of Modules
Replacement, Isolator, Allocator, Authentication, Hook, Anonymous
- Mesos has maintenance mode
Mesos backwards compatible from v1.0 forward
Marathon can be instructed to deploy containers based on that component using a blue/green strategy
where old and new versions co-exist for a time.
Master tracks a set of statistics and metrics to monitor resource usage
Counters and Gauges
support for health checks (HTTP and TCP)
an event stream that can be integrated with load-balancers or for analyzing metrics
An IP per Container
No longer share the node's IP
Helps remove port conflicts
Enables 3rd party network drivers
Container Network Interface (CNI) isolator with MesosContainerize
Marathon offers two TCP/HTTP proxies
A simple shell script and a more complex one called marathon-lb that has more features.
Pluggable (e.g. Traefic for load-balancing)
Only supported by Enterprise DC/OS
A strength of Mesos’s architecture
requires masters to form a quorum using ZooKeeper (point of failure)
only one Active (Leader) master at-a-time in Mesos and Marathon
Scale is a strong suit for Mesos. Used at Twitter, AirBnB... TBD for Marathon
Great at asynchronous jobs. High availability built-in.
Referred to as the “golden standard” by Solomon Hykes, Docker CTO.
A high-level perspective of the container orchestrator spectrum.
clouds, containers, infrastructure,
applications and their management