Characterizing and Contrasting Kuhn-tey-ner

Awr-kuh-streyt-ors

All Things Open, October 2016

Lee Calcote

Lee Calcote

clouds, containers, infrastructure,

applications and their management

Show of Hands

[kuh n-tey-ner] 

[awr-kuh-streyt-or] 

Definition:

Fleet

Nomad

Swarm
Kubernetes
Mesos+Marathon

CaaS

(Stay tuned for updates to presentation and book)

One size does not fit all.

A strict apples-to-apples comparison is inappropriate and not the objective, hence characterizing and contrasting.

Let's not go here today.

Container orchestrators may be intermixed.

Categorically Speaking

  • Genesis & Purpose

  • Support & Momentum

  • Host & Service Discovery

  • Scheduling

  • Modularity & Extensibility

  • Updates & Maintenance

  • Health Monitoring

  • Networking & Load-Balancing

  • High Availability & Scale

Hypervisor
Manager
Elements

  • Compute

  • Network

  • Storage

Container
Orchestrator
Elements

  • Host (Node)
  • Container
  • Task
  • Service
  • Volume
  • Applications

Core

Capabilities

  • Cluster Management

    • Host Discovery

    • Host Health Monitoring

  • Scheduling

  • Orchestrator Updates and Host Maintenance

  • Service Discovery

  • Networking and Load-Balancing

Additional

Key Capabilities

  • Application Health Monitoring

  • Application Deployments

  • Application Performance Monitoring

Nomad

Genesis & Purpose

  • designed for both long lived services and short lived batch processing workloads.
     
  • cluster manager with declarative job specifications.
     
  • ensures constraints are satisfied and resource utilization is optimized by efficient task packing.
     
  • supports all major operating systems and virtualized, containerized or standalone workloads.
     
  • written in Go.

 

Support & Momentum

  • Project began June 2015 has 113 contributors over 16 months
    • Current release v0.4

    • v0.5 to be in a week or so

    • Nomad Enterprise offering aimed for Q1-Q2 next year.

  • Supported and governed by HashiCorp

    • Hashiconf US '15 had ~300 attendees

    • Hashiconf EU '16 had ~320 attendees

    • HashiConf US '16 had ~ 500 attendees

Nomad Architecture

Nomad is a single binary, both for clients and servers, and requires no external services for coordination or storage.

Host &
      Service Discovery

Host Discovery

  • Gossip protocol - Serf is used

    • Docker multi-host networking and Swarmkit use Serf, too

  • Servers advertise full set of Nomad servers to clients via heartbeats every 30 seconds

  • Creating federated clusters is simple

Service Discovery

  • Nomad integrates with Consul to provide service discovery and monitoring.

  •  

Scheduling

  • two distinct phases, feasibility checking and ranking.
     
  • optimistically concurrent

    • enabling all servers to participate in scheduling decisions which increases the total throughput and reduces latency
       

  • three scheduler types used when creating jobs:

    • service, batch and system

    •  nomad plan point-in-time-view of what Nomad will do

Modularity & Extensibility

Task drivers

  • Used by Nomad clients to execute a task and provide resource isolation.
     

  • By having extensible task drivers are important for flexibility to support a broad set of workloads.
     

  • Does not currently support pluggable task drivers,

    • Have to iImplement task driver interface and compile Nomad binary.

Updates &

Maintenance

Nodes

  • Drain allocations on a running node.

  • integrates with tools like Packer, Consul, and Terraform to support building artifacts, service discovery, monitoring and capacity management.

Applications

  • Log rotation (stderr and stdout)

    • ​​no log forward support, yet

  • Rolling updates (via the `update` block in the job specification).

Health Monitoring

Nodes

  • Node health monitoring is done via heartbeats, so Nomad can detect failed nodes and migrate the allocations to other healthy clients.

 

Applications

  • currently http, tcp and script

  • In the future Nomad will add support for more Consul checks.

  • nomad alloc-status reports actual resource utilization

Networking & Load-Balancing

Networking

 

  • Dynamic ports are allocated in a range from 20000 to 60000.

  • Shared IP address with Node

  •  

Load-Balancing

  • Consul provides DNS-based load-balancing

Secrets Management

  • Nomad agents provide secure integration with Vault
    • for all tasks and containers it spins up

 

  • gives secure access to Vault secrets through a workflow which minimizes risk of secret exposure during bootstrapping.

High Availability & Scale

  • distributed and highly available, using both leader election and state replication to provide availability in the face of failures.
     

  • shared state optimistic scheduler
    • only open source implementation.
       
  • 1,000,0000 across 5,000 hosts and scheduled in 5 min.

 

  • Built for managing multiple clusters / cluster federation.

  • easier to use
  • a single binary for both clients and servers
  • supports different non-containerized tasks
  • arguably the most advanced scheduler design
  • upfront consideration of federation / hybrid cloud
  • broad OS support
  • Outside of scheduler, comparatively less sophisticated

  • Young project

  • Less relative momentum

  • Less relative adoption

  • Less extensible / pluggable

Docker Swarm

Docker Swarm 1.12

aka

Swarmkit or Swarm mode

Genesis & Purpose

  • Swarm is simple and easy to setup.
     

  • Responsible for the clustering and scheduling aspects of orchestration.  
     

  • Originally an imperative system, now declarative
     

  • Swarm’s architecture is not complex as those of Kubernetes and Mesos
     

  • Written in Go, Swarm is lightweight, modular and extensible

Docker Swarm 1.11 (Standalone)

Docker Swarm Mode 1.12 (Swarmkit)

Support & Momentum

  • Contributions:

    • Standalone: ~3,000 commits, 12 core maintainers (140 contributors)

    • Swarmkit: ~2,000 commits, 12 core maintainers (40 contributors)
       

  • ~250 Docker meetups worldwide
     

  • Production-ready:

    • Standalone announced ~12 months ago (Nov 2015)

    • Swarmkit announced ~3 month ago (July 2016)

Host & Service Discovery

Host Discovery

  • used in the formation of clusters by the Manager to discover for Nodes (hosts).

  • Like Nomad, uses Hashicorp's goMemDB for storing cluster state

  • Pull model - where worker checks-in with the Manager

  • Rate Control - of checks-in with Manager may be controlled at Manager - add jitter

  • Workers don't need to know which Manager is active; Follower Managers will redirect Workers to Leader

Service Discovery

  • Embedded DNS and round robin load-balancing

  • Services are a new concept

 

Scheduling

  • Swarm’s scheduler is pluggable

  • Swarm scheduling is a combination of strategies and filters/constraint: 

    • Strategies

      • Random, Binpack

      • Spread*

      • Plugin?

    • Filters

      • container constraints (affinity, dependency, port) are defined as environment variables in the specification file

      • node constraints (health, constraint) must be specified when starting the docker daemon and define which nodes a container may be scheduled on.

Swarm Mode only supports Spread

Modularity & Extensibility

Ability to remove batteries is a strength for Swarm:

  • Pluggable scheduler

  • Pluggable network driver

  • Pluggable distributed K/V store

  • Docker container engine runtime-only

  • Pluggable authorization (in docker engine)*

Updates & Maintenance

Nodes

  • Nodes may be Active, Drained and Paused

    • Manager weights are used to drain or pause Managers

  • Manual swarm manager and worker updates
     

Applications

  • Rolling updates now supported

    • --update-delay

    • --update-parallelism

    • --update-failure-action

Health Monitoring

Nodes

  • Swarm monitors the availability and resource usage of nodes within the cluster

 

Applications

  • One health check per container may be run
    • check container health by running a command inside the container
      • --interval=DURATION (default: 30s)
      • --timeout=DURATION (default: 30s)
      • --retries=N (default: 3)

Networking & Load-Balancing

  • Swarm and Docker’s multi-host networking are simpatico

    • provides for user-defined overlay networks that are micro-segmentable

    • uses a gossip protocol for quick convergence of neighbor table

    • facilitates container name resolution via embedded DNS server (previously via etc/hosts)

  • You may bring your own network driver

  • Load-balancing based on IPVS

    • ​expose Service's port externally

    • L4 load-balancer; cluster-wide port publishing

  • ​Mesh routing

    • ​send a request to any one of the nodes and it will be routed automatically

    • send a request to any one of the nodes and it will be internally load balanced

Secrets Management

Not yet

 

tracking toward 1.13

High Availability & Scale

  • Managers may be deployed in a highly-available configuration

    • Active/Standby - ​only one active Leader at-a-time

    • Maintain odd number of managers
       

  • Rescheduling upon node failure

    • ​No rebalancing upon node addition to the cluster
       

  • Does not support multiple failure isolation regions or federation

Scaling swarm to 1,000 AWS nodes and 50,000 containers

  • Suitable for orchestrating a combination of infrastructure containers

    • Has only recently added capabilities falling into the application bucket

  • Swarm is a young project

    • advanced features forthcoming

    • natural expectation of caveats in functionality

  • No rebalancing, autoscaling or monitoring, yet

  • Only schedules Docker containers, not containers using other specifications.

    • Does not schedule VMs or non-containerized processes

    • Does not provide support for batch jobs

  • Need separate load-balancer for overlapping ingress ports

  • While dependency and affinity filters are available, Swarm does not provide the ability to enforce scheduling of two containers onto the same host or not at all.

    • Filters  facilitate sidecar pattern. No “pod” concept.

  • Swarm works. Swarm is simple and easy to deploy.

    • ​1.12 eliminated the need for much third-party software

    • Facilitates earlier stages of adoption by organizations viewing containers as faster VMs

    • now with built-in functionality for applications

  • Swarm is easy to extend, if can already know Docker APIs, you can customize Swarm

  • Still modular, but has stepped back here.

  • Moving very fast; eliminating gaps quickly.

Kubernetes

Genesis & Purpose

  • an opinionated framework for building distributed systems

    • or as its tagline states "an open source system for automating deployment, scaling, and operations of applications."

  • Written in Go, Kubernetes is lightweight, modular and extensible

  • considered a third generation container orchestrator led by Google, Red Hat and others.

    • bakes in load-balancing, scale, volumes, deployments, secret management and cross-cluster federated services among other features.

  • Declaratively, opinionated with many key features included

 

Kubernetes Architecture

Support & Momentum

  • Kubernetes is young (about two years old)

    • Announced as production-ready 15 months ago (July 2015)
       

  • Project currently has over 1,000 commits per month (~38,000 total)

    • made by about 100 (862 total)  Kubernauts (Kubernetes enthusiasts)

    • ~5,000 commits made in 1.3 release (1.4 is latest)
       

  • Under the governance of the Cloud Native Computing Foundation
     

  • Robust set of documentation and ~90 meetups 

Host & Service Discovery

Host Discovery

  • by default, the node agent (kubelet) is configured to register itself with the master (API server)

    • automating the joining of new hosts to the cluster

Service Discovery

Two primary modes of finding a Service

  • DNS

    • SkyDNS is deployed as a cluster add-on

  • environment variables​

    • environment variables are used as a simple way of providing compatibility with Docker links-style networking

Scheduling

  • By default, scheduling is handled by kube-scheduler.

  • Pluggable

  • Selection criteria used by kube-scheduler to identify the best-fit node is defined by policy:

    • Predicates (node resources and characteristics):

      • ​PodFitPorts , PodFitsResources, NoDiskConflict , MatchNodeSelector, HostName , ServiceAffinit, LabelsPresence

    • Priorities (weighted strategies used to identify “best fit” node):

      • ​LeastRequestedPriority, BalancedResourceAllocation, ServiceSpreadingPriority, ​EqualPriority

Modularity &

         Extensibility

  • One of Kubernetes strengths its pluggable architecture and it being an extensible platform 

  • Choice of:

    • database for service discovery or network driver

    • container runtime

      • users may choose to run Docker with Rocket containers

  • ​Cluster add-ons

    • optional system components that implement a cluster feature (e.g. DNS, logging, etc.)

    • shipped with the Kubernetes binaries and are considered an inherent part of the Kubernetes clusters

       

Updates & Maintenance

Applications

  • Deployment objects automate deploying and rolling updating applications.​

  • Support for rolling back deployments

Kubernetes Components

  • Consistently backwards compatible

  • Upgrading the Kubernetes components and hosts is done via shell script 

  • Host maintenance - mark the node as unschedulable.

    • existing pods are vacated from the node

    • prevents new pods from being scheduled on the node

Health Monitoring

Nodes

  • Failures - actively monitors the health of nodes within the cluster

    • via Node Controller

  • Resources - usage monitoring leverages a combination of open source components:

    • cAdvisor, Heapster, InfluxDB, Grafana

Applications 

  • three types of user-defined application health-checks and uses the Kubelet agent as the the health check monitor

    • ​HTTP Health Checks, Container Exec, TCP Socket

​Cluster-level Logging

  • collect logs which persist beyond the lifetime of the pod’s container images or the lifetime of the pod or even cluster

    • ​standard output and standard error output of each container can be ingested using a Fluentd agent running on each node

Networking & Load-Balancing

…enter the Pod

  • atomic unit of scheduling

  • flat networking with each pod receiving an IP address

  • no NAT required, port conflicts localized

  • intra-pod communication via localhost​

Load-Balancing

  • Services provide inherent load-balancing via kube-proxy:

    • runs on each node of a Kubernetes cluster

    • reflects services as defined in the Kubernetes API

    • supports simple TCP/UDP forwarding and round-robin and Docker-links-based service IP:PORT mapping. 

Secrets Management

  • Secrets are used by container in a pod either:

     

    1. mounted as data volumes

    2. exposed as environment variables
       

  • None of the pod’s containers will start until all the pod’s volumes are mounted.

  • Individual secrets are limited to 1MB in size.

  • Secrets are created and accessible within a given namespace, not cross-namespace.

High Availability & Scale

  • Each master component may be deployed in a highly-available configuration.

    • ​Active/Standby configuration

  • Federated clusters / multi-region deployments

​​Scale

  • v1.2 support for 1,000 node clusters

  • v1.3 supports 2,000 node clusters
     

  • Horizontal Pod Autoscaling​ (via Replication Controllers​).

  • Cluster Autoscaling (if you're running on GCE with AWS support is coming soon).

  • Only runs containerized applications

  • ​ For those familiar with Docker-only, Kubernetes requires understanding of new concepts

    • Powerful frameworks with more moving pieces beget complicated cluster deployment and management.

  • Lightweight graphical user interface

  • Does not provide as sophisticated techniques for resource utilization as Mesos

 

 

  • Kubernetes can schedule docker or rkt containers

  • Inherently opinionated w/functionality built-in.

    • relatively easy to change its opinion

    • little to no third-party software needed

    • builds in many application-level concepts and services (petsets, jobsets, daemonsets, application packages / charts, etc.)

    • advanced storage/volume management

  • project has most momentum

  • project is arguably most extensible

  • thorough project documentation

  • Supports multi-tenancy

  • Multi-master, cross-cluster federation, robust logging & metrics aggregation

 

Mesos

+

Marathon

Genesis & Purpose

  • Mesos is a distributed systems kernel

    • stitches together many different machines into a logical computer

  • Mesos has been around the longest (launched in 2009)

    • and is arguably the most stable, with highest (proven) scale currently

  • Mesos is written in C++

    • with Java, Python and C++ APIs

  • Marathon as a Framework

    • Marathon is one of a number of frameworks (Chronos and Aurora other examples) that may be run on top of Mesos

    • Frameworks have a scheduler and executor. Schedulers get resource offers. Executors run tasks.

    • Marathon is written in Scala

Mesos Architecture

Support & Momentum

  • MesosCon 2015 in Seattle had 700 attendees

    • up from 262 attendees in 2014
       

  • Mesos had 78 contributors

  • Marathon had 217 contributors
     

  • Under the governance of Apache Foundation
     

  • Mesos is used by Twitter, AirBnb, eBay, Apple, Cisco, Yodle

  • Marathon is used by Verizon and Samsung

Host &
      Service Discovery

  • Mesos-DNS generates an SRV record for each Mesos task

    • including Marathon application instances

  • Marathon will ensure that all dynamically assigned service ports are unique

  • Mesos-DNS is particularly useful when:

    • apps are launched through multiple frameworks (not just Marathon)

    • you are using an IP-per-container solution like Project Calico

    • you use random host port assignments in Marathon

Scheduling

  • Two level scheduler

    • First level scheduling happens at mesos master based on allocation policy , which decides which framework get resources.

    • Second level scheduling happens at Framework scheduler , which decides what tasks to execute.

  • Provide reservations, over-subscriptions and preemption.

Modularity & Extensibility

Frameworks

  • multiple available

  • may run multiple frameworks

Modules

  • extend inner workings of Mesos by creating and using shared libraries that are loaded on demand

  • many types of Modules

    • Replacement, Isolator, Allocator, Authentication, Hook, Anonymous

Updates & Maintenance

Nodes

- Mesos has maintenance mode

  • Mesos backwards compatible from v1.0 forward

  • Marathon ?

 

Applications

  • Marathon can be instructed to deploy containers based on that component using a blue/green strategy

    • where old and new versions co-exist for a time.

Health Monitoring

Nodes

  • Master tracks a set of statistics and metrics to monitor resource usage

    • Counters and Gauges

Applications

  • support for health checks (HTTP and TCP)

  • an event stream that can be integrated with load-balancers or for analyzing metrics

Networking & Load-Balancing

Networking

  • An IP per Container

    • No longer share the node's IP

    • ​Helps remove port conflicts

    • Enables 3rd party network drivers

  • Container Network Interface (CNI)  isolator with MesosContainerize

Load-Balancing

  • Marathon offers two TCP/HTTP proxies

    • A simple shell script and a more complex one called marathon-lb that has more features.

    • Pluggable (e.g. Traefic for load-balancing)

Secrets Management

Not yet.

 

Only supported by Enterprise DC/OS

 

  • Secrets shorter than eight characters may not be accepted by Marathon.
  • By default, you cannot store a secret larger than 1MB.

High Availability & Scale

  • A strength of Mesos’s architecture

    • requires masters to form a quorum using ZooKeeper (point of failure)

    • only one Active (Leader) master at-a-time in Mesos and Marathon

 

  • Scale is a strong suit for Mesos. Used at Twitter, AirBnB... TBD for Marathon

 

  • Great at asynchronous jobs. High availability built-in.

    • Referred to as the “golden standard” by Solomon Hykes, Docker CTO.

  • Universal Containerizer

    • abstract away from docker, rkt, kurma?, runc, appc

  • Can run multiple frameworks, including Kubernetes and Swarm.

  • Supports multi-tenancy.

  • Good for Big Data shops and job / task-oriented workloads.​

    • Good for mixed workloads and with data-locality policies

  • Mesos is powerful and scalable, battle-tested

    • ​Good for multiple large things you need to do 10,000+ node cluster system

  • ​Marathon UI is young, but promising.

  • Still needs 3rd party tools

  • Marathon interface could be more Docker friendly (hard to get at volumes and registry)

  • May need a dedicated infrastructure IT team

    • ​an overly complex solution for small deployments

Summary

A high-level perspective of the container orchestrator spectrum.

Lee Calcote

Thank you. Questions?

clouds, containers, infrastructure,

applications and their management