Lee Calcote
March 2nd, 2017
clouds, containers, infrastructure, applications and their management
developer-friendly and application-driven
simple to use and deploy for developers and operators
better or on par with existing
virtual data center networking
Intent-based networking
but no need to actually know these
...is a specification proposed by CoreOS and adopted by projects such as rkt, Kurma, Kubernetes, Cloud Foundry, and Apache Mesos.
Plugins created by projects such as Weave, Project Calico, Plumgrid, Midokura and Contiv.
Local Drivers
Docker Runtime
Bridge
Container Network Model (libnetwork)
Remote Drivers
None
Overlay
Third-party
Network Sandbox
Endpoint
Network
Container
Network Sandbox
Endpoint
Container
Network Sandbox
Endpoint
Container
Endpoint
Network
Docker Engine
Network Driver
Network Driver
IPAM Driver
Microsegmentation - traffic is not relayed
Container Runtime
Container Network Interface
(CNI)
Loopback
Plugin
Bridge
Plugin
PTP
Plugin
MACvlan
Plugin
IPvlan
Plugin
Third-party
Plugin
Container runtime needs to:
allocate a network namespace to the container and assign a container ID
pass along a number of parameters (CNI config) to network driver.
Network driver needs to:
attach container to a network
then, report the assigned IP address back to the container runtime (via JSON schema)
{
"name": "mynet",
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "10.22.0.0/16",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
(JSON)
Similar in that each...
...allow multiple network drivers to be active and used concurrently
each provide a one-to-one mapping of network to that network’s driver
...allow containers to join one or more networks.
...allow the container runtime to launch the network in its own namespace
segregate the application/business logic of connecting the container to the network to the network driver.
Different in that...
bridge
eth0
eth1
container
container
container
Host
good ol' l0
container receives a network stack, but lacks an external network interface.
it does, however, receive a loopback interface.
Web Host
MySQL
Ambassador
PHP
DB Host
PHP
Ambassador
MySQL
link
link
Ah, yes, docker0
one container reuses (maps to) the networking namespace of another container.
may only be invoked when running a docker container (cannot be defined in Dockerfile):
--net=container=some_container_name_or_id
container created shares its network namespace with the host
suffers port conflicts
secure?
better performance
easy to understand and troubleshoot
default Mesos networking mode
use networking tunnels to delivery communication across hosts
BYOKV?
Docker -
1.11 requires K/V store
built-in as of 1.12
uses raft implementation from etcd
uses gomemdb and Serf from HashiCorp
WeaveNet - limited to single network; does not require K/V store
WeaveMesh - does not require separate K/V store
Flannel - requires K/V store
Plumgrid - requires K/V store; built-in and not pluggable
Midokura - requires K/V store; built-in and not pluggable
expose host interface(s) directly to container(s)
(e.g. the physical network interface at eth0)
not necessarily public cloud friendly
default rkt networking mode
Internet
allows creation of multiple virtual network interfaces behind the host’s physical interface
Each virtual interface has unique MAC and IP addresses assigned
with restriction: the assigned IP address needs to be in the same broadcast domain as the physical interface
eliminates the need for the Linux bridge, NAT and port-mapping
allowing you to connect directly to physical interface
switchport port-security mac-address sticky
allows creation of multiple virtual network interfaces behind a host’s physical interface
Each virtual interface has unique IP addresses assigned
Same MAC address used for all containers
L2-mode containers must be on same network as host (similar to MACvlan)
L3-mode containers must be on different network than host
Network advertisement and redistribution into the network still needs to be done.
While multiple modes of networking are supported on a given host, MACvlan and IPvlan can’t be used on the same physical interface concurrently.
ARP and broadcast traffic, the L2 modes of these underlay drivers operate just as a server connected to a switch does by flooding and learning using 802.1d packets
IPvlan L3-mode - No multicast or broadcast traffic is allowed in.
In short, if you’re used to running trunks down to hosts, L2 mode is for you.
If scale is a primary concern, L3 has the potential for massive scale.
Benefits of pushing past L2 to L3
resonates with network engineers
leverage existing network infrastructure
use routing protocols for connectivity; easier to interoperate with existing data center across VMs and bare metal servers
Better scaling
More granular control over filtering and isolating network traffic
Easier traffic engineering for quality of service
Easier to diagnose network issues
a way of gaining access to many more IP addresses, expanding from one assigned IP address to 250 more IP addresses
“address expansion” - multiplies the number of available IP addresses on the host, providing an extra 253 usable addresses for each host IP
Fan addresses are assigned as subnets on a virtual bridge on the host,
IP addresses are mathematically mapped between networks
uses IP-in-IP tunneling; high performance
particularly useful when running containers in a public cloud
where a single IP address is assigned to a host and spinning up additional networks is prohibitive or running another load-balancer instance is costly
IPAM, multicast, broadcast, IPv6, load-balancing, service discovery, policy, quality of service, advanced filtering and performance are all additional considerations to account for when selecting container networking that fits your needs.
IPv6
lack of support for IPv6 in the top public clouds
reinforces the need for other networking types (overlays and fan networking)
some tier 2 public cloud providers offer support for IPv6
IPAM
most container runtime engines default to host-local for assigning addresses to containers as they are connected to networks.
Host-local IPAM involves defining a fixed block of IP addresses to be selected.
DCHP is universally supported across the container networking projects.
CNM and CNI both have IPAM built-in and plugin frameworks for integration with IPAM systems
Swarm and multi-host networking are simpatico
user-defined overlay networks that are micro-segmentable
uses Hashicorp's Serf gossip protocol for quick convergence of neighbor tables between hosts
facilitates container name resolution via embedded DNS server (previously via etc/hosts)
Load-balancing based on IPVS
expose Service's port externally;
L4 load-balancer
Mesh routing
send a request to any one of the nodes and it will be routed automatically
send a request to any one of the nodes and it will be internally load balanced
a massive leap forward
with a small step back - a cluster-wide namespace for port publishing
Docker Overlay - VXLAN overlay
Calico - L3 w/optional encapsulation
Flannel - VXLAN or UDP Overlay
Weave Net - VXLAN or UDP Overlay
Canal - VXLAN or UDP Overlay
Romana - L3
Cilium - L3 w/optional encapsulation
Trireme - L3 w/TLS
Contiv - L2, L3 (BGP), VXLAN Overlay
VMware NSX - VXLAN UDP Overlay
SolarWinds - All
Midokura -
Nuage -
PLUMgrid -
See additional research.
Nearly all are open source
Use a variety of technologies
VXLAN
UDP Overlay
L2
L3
with or w/o TLS
BGP
IP routed
Contact for early access. Learn more -
Cluster visibility -
See container network flows (current bandwidth and direction) across Kubernetes and Docker Swarm nodes.
Bandwidth test -
Test throughput (performance) of each type of container network (compare network drivers).
Choose wisely -
Be aware of the cost of overlay convenience.
Avoid MAC address overload in underlays.
clouds, containers, functions,
applications and their management