The
Enterprise Path to Service Mesh Architectures

Engineering Summit 2020

Lee Calcote

Founder

cloud native and its management

Enabler of Engineers. Enabler of Speed. Enabler of Business.

Research Partners

Technology Partners

Third step in Cloud Native journey

Meshery is interoperable with each abstraction.

Container

Orchestrator

Mesh

5.5 years
(Jun 2014)

4.5 years
(Jul 2015)

3 years
(Apr 2017)

5.5 years ago
(Jun 2014)

7 years ago
(Mar 2013)

4 years ago
(Feb 2016)

v1.0

Announced

Market Opportunity and Adoption

Service meshes are not restricted to the realm of containers, but provide value to legacy, monolithic applications are well.

 

 

According to the 451 Analyst Group, the container market will approach $5 Billion in the next couple of years.

 

Surveys like the CNCF survey show that, late last year, over 40% of enterprises are already running cloud-native technologies in production, with over 80% of those enterprises running Kubernetes.

 

Suffice it to say – this is a massive market that is radically changing how enterprises deploy and operate applications in production.

 

Developer Velocity
(aka speed)

Development Process

Application Architecture

Deployment and Packaging

Application Infrastructure

Agile

Waterfall

DevOps

N-Tier

Monolithic

Microservices

Cloud

Containers

Physical Servers

Virtual Servers

Data Center

Hosted

Evolution to Cloud Native

λ

Functions

Serverless

Events

SRE

(Unikernels)

bare metal

     AND

          virtual machines

               AND

                    containers

                         AND

                              unikernels

                                   AND

                                        functions

the future is AND not OR

We hold these truths to be self-evident...

Stacking up Service Mesh

analogous to the serverless value prop

Same value proposition as serverless, but for long-lived services (much the world's workloads)

IaaS

CaaS

PaaS

SaaS

Service Mesh

  • Short-lived
  • Event-driven
  • Long-lived
  • Process-driven
  • Always hot

FaaS

“Linkerd is shifting the developer mind to focus on the business logic and not on some technicality. It just makes it easier and faster to deploy and develop, and you have a safety net.”

Case Studies

Autotrader UK

Autotrader UK adopted Istio and:

  • Reduced deployment time by almost 97%, from 10 minutes to 20 seconds.
     
  • Avoided a six-month engineering effort for a more secure infrastructure.
     
  • 75% more efficient CPU and RAM allocation through greater oversight of applications.

Apester adopted Linkerd and:

  • Shortened Mean Time To Repair (MTTR) by a factor of 2
     
  • Processing 20 billion requests per month with no timeouts.
     
  • No outages for 6 months and counting

Apester

What is a Service Mesh?

a dedicated layer for managing service-to-service communication

So, a microservices platform?

obviously.

partially.

a services-first network

Reviews v1

Reviews Pod

Reviews v2

Reviews v3

Product Pod

Details Container

Details Pod

Ratings Container

Ratings Pod

Product Container

Reviews Service

Ratings Service

Details Service

Product Service

BookInfo Sample App

BookInfo Sample App on Service Mesh

Reviews v1

Reviews Pod

Reviews v2

Reviews v3

Product Pod

Details Container

Details Pod

Ratings Container

Ratings Pod

Product Container

Envoy sidecar

Envoy sidecar

Envoy sidecar

Envoy sidecar

Envoy sidecar

Reviews Service

Enovy sidecar

Envoy ingress

Product Service

Ratings Service

Details Service

Observability

what gets people hooked on service metrics

Goals

  • Metrics without instrumenting apps

  • Consistent metrics across fleet

  • Trace flow of requests across services

  • Portable across metric back-end providers

You get a metric!  You get a metric!  Everyone gets a metric!

Traffic Control

control over chaos

  • Traffic splitting
    • L7 tag based routing?
  • Traffic steering
    • Look at the contents of a request and route it to a specific set of instances.
  • Ingress and egress routing

Resilency

 

  • Systematic fault injection
  • Timeouts and Retries with timeout budget

  • Control connection pool size and request load

  • Circuit breakers and Health checks

 

content-based traffic steering

Missing: application lifecycle management, but not by much

Missing: distributed debugging; provide nascent visibility (topology)

Why use a Service Mesh?

to avoid...

  • Bloated service (application) code

  • Duplicating work to make services production-ready

    • Load balancing, auto scaling, rate limiting, traffic routing...

  • Inconsistency across services

    • Retry, tls, failover, deadlines, cancellation, etc., for each language, framework

    • Siloed implementations lead to fragmented, non-uniform policy application and difficult debugging

  • Diffusing responsibility of service management

Help with Modernization

 

  • Can modernize your IT inventory without:

    • Rewriting your applications

    • Adopting microservices, regular services are fine

    • Adopting new frameworks

    • Moving to the cloud

address the long-tail of IT services

Get there for free

DEV

OPS

Decoupling at Layer 5

where Dev and Ops meet

Problem: too much infrastructure code in services

Relating to

Service Meshes

Which is why...


 I have client-side libraries.

Enforcing consistency is challenging.

Foo Container

Flow Control

Foo Pod

Go Library
A v1

Network Stack

Service Discovery

Circuit Breaking

Application / Business Logic

Bar Container

Flow Control

Bar Pod

Go Library
A v2

Network Stack

Service Discovery

Circuit Breaking

Application / Business Logic

Baz Container

Flow Control

Baz Pod

Java Library
B v1

Network Stack

Service Discovery

Circuit Breaking

Application / Business Logic

Retry Budgets

Rate Limiting

Which is why...

 I have a container orchestrator.

Core

Capabilities

  • Cluster Management

    • Host Discovery

    • Host Health Monitoring

  • Scheduling

  • Orchestrator Updates and Host Maintenance

  • Service Discovery

  • Networking and Load Balancing

  • Stateful Services

  • Multi-Tenant, Multi-Region

Additional

Key Capabilities

  • Application Health and Performance Monitoring

  • Application Deployments

  • Application Secrets

minimal capabilities required to qualify as a container orchestrator

Service meshes generally rely on these underlying layers.

Which is why...

 I have an API gateway.

Microservices API Gateways

north-south vs. east-west

What do we need?

• Observability

• Logging
• Metrics
• Tracing

• Traffic Control

• Resiliency

• Efficiency
• Security

Policy

...a Service Mesh

Service Mesh Architectures

Service Mesh Architecture

Data Plane

  • Touches every packet/request in the system.
  • Responsible for service discovery, health checking, routing, load balancing, authentication, authorization, and observability.

Ingress Gateway

Egress Gateway

the workhorse

Service Mesh Architecture

No control plane? Not a service mesh.

Control Plane

  • Provides policy, configuration, and platform integration.
  • Takes a set of isolated stateless sidecar proxies and turns them into a service mesh.
  • Does not touch any packets/requests in the data path.

Data Plane

  • Touches every packet/request in the system.
  • Responsible for service discovery, health checking, routing, load balancing, authentication, authorization, and observability.

Ingress Gateway

Egress Gateway

Service Mesh Architecture

Control Plane

Data Plane

  • Touches every packet/request in the system.
  • Responsible for service discovery, health checking, routing, load balancing, authentication, authorization, and observability.
  • Provides policy, configuration, and platform integration.
  • Takes a set of isolated stateless sidecar proxies and turns them into a service mesh.
  • Does not touch any packets/requests in the data path.

You need a management plane.

Ingress Gateway

Management
Plane

  • Provides monitoring, backend system integration, expanded policy and application configuration, federation of services.

Egress Gateway

Pilot

Citadel

Mixer

Control Plane

Data Plane

istio-system namespace

policy check

Foo Pod

Proxy Sidecar

Service Foo

tls certs

discovery & config

Foo Container

Bar Pod

Proxy Sidecar

Service Bar

Bar Container

Out-of-band telemetry propagation

telemetry

 

reports

Control flow

application traffic

Application traffic

application namespace

telemetry reports

Istio Architecture

Galley

Ingress Gateway

Egress Gateway

Control Plane

Data Plane

octa-system namespace

policy check

Foo Pod

Proxy
Sidecar

Service Foo

discovery & config

Foo Container

Bar Pod

Service Bar

Bar Container

Out-of-band telemetry propagation

telemetry

 

reports

Control flow

application traffic

Application traffic

application namespace

telemetry reports

Octarine Architecture

Policy
Engine

Security Engine

Visibility
Engine

+

Proxy
Sidecar

+

Control Plane

Data Plane

linkerd-system namespace

Foo Pod

Proxy Sidecar

Service Foo

Foo Container

Bar Pod

Proxy Sidecar

Service Bar

Bar Container

Out-of-band telemetry propagation

telemetry

 

scarping

Control flow during request processing

application traffic

Application traffic

application namespace

telemetry scraping

Architecture

destination

Prometheus

Grafana

tap

web

CLI

proxy-api

public-api

Linkerd

proxy-injector

Control Plane

Data Plane

Foo Pod

NSM Dataplane

Service Foo

Foo Container

Bar Pod

Proxy Sidecar

Service Bar

Bar Container

Out-of-band telemetry propagation

telemetry

 

scarping

Control flow during request processing

application traffic

Application traffic

application namespace

telemetry scraping

Architecture

destination

Prometheus

Registry

NSMe

domain

client

proxy-api

public-api

Network
Service
Mesh

proxy-injector

layer5.io/landscape

It's meshy out there.

Adopter’s Dilemma

Which service mesh to use?

What's the catch? Nothing's for free.

Lifecycle Management

WHICH SERVICE MESH SHOULD I USE AND HOW DO I GET STARTED?

 

Performance Benchmark

WHAT OVERHEAD DOES BEING ON THE SERVICE MESH INCUR?

 

https://meshery.io

Configuration

Security

Telemetry

Control Plane

Data
Plane

service mesh ns

Foo Pod

Proxy Sidecar

Service Foo

Foo Container

Bar Pod

Proxy Sidecar

Service Bar

Bar Container

Out-of-band telemetry propagation

Control flow

application traffic

http / gRPC

Application traffic

application namespace

Meshery Architecture

Ingress Gateway

Egress Gateway

Management
Plane

meshery

adapter

gRPC

kube-api

kube-system

Service mesh abstractions to the rescue

Meshery is interoperable with each abstraction.

Service Mesh Interface

(SMI)

Multi-Vendor Service Mesh Interoperation (Hamlet)

Service Mesh Performance Specification (SMPS)

A standard interface for service meshes on Kubernetes.

A set of API standards for enabling service mesh federation.

A format for describing and capturing service mesh performance.

layer5.io/meshery

Pod Memory Usage

Application resource consumption

layer5.io/meshery

Pod CPU Usage

Application resource consumption

layer5.io/meshery

Control Plane Memory

Istio

Linkerd

Consul

layer5.io/meshery

Control Plane CPU

Istio

Linkerd

Consul

hello@layer5.io

Service Mesh Deployment Models

Client

Edge Cache

Istio Gateway

(envoy)

Cache Generator

Collection of VMs running APIs

service mesh

Istio VirtualService

Istio VirtualService

Istio ServiceEntry

Situation:

  • existing services running on VMs (that have little to no service-to-service traffic).
  • nearly all traffic flows from client to the service and back to client.

 

Benefits:

  • gain granular traffic control (e.g path rewrites).
  • detailed service monitoring without immediately deploying a thousand sidecars.

Ingress

Out-of-band telemetry propagation

Application traffic

Control flow

Proxy per Node

Service A

Service A

Service A

linkerd

Node (server)

Service A

Service A

Service B

linkerd

Node (server)

Service A

Service A

Service C

linkerd

Node (server)

Advantages:

  • Less (memory) overhead.

  • Simpler distribution of configuration information.

  • primarily physical or virtual server based; good for large monolithic applications.
     

Disadvantages:

  • Coarse support for encryption of service-to-service communication, instead host-to-host encryption and authentication policies.

  • Blast radius of a proxy failure includes all applications on the node, which is essentially equivalent to losing the node itself.

  • Not a transparent entity, services must be aware of its existence.

layer5.io/books

Advantages:

  • Good starting point for building a brand-new microservices architecture or for migrating from a monolith.

Disadvantages:

  • When the number of services increase, it becomes difficult to manage.

Router "Mesh"

Fabric Model

Advantages:

  • Granular encryption of service-to-service communication.

  • Can be gradually added to an existing cluster without central coordination.

Disadvantages:

  • Lack of central coordination. Difficult to scale operationally.

Ingress or Edge Proxy

Advantages:

  • Works with existing services that can be broken down over time.

Disadvantages:

  • Is missing the benefits of service-to-service visibility and control.

Service Mesh Performance Specification (SMPS)

A project and vendor-neutral specification for capturing details of:

  1. Environment / Infrastructure

    • Number and size of nodes, orchestrator

  2. Service mesh and its configuration

  3. Service / application details

Bundled with test results.

 

layer5.io/performance

github.com/layer5io/service-mesh-performance-specification

Development Process

Application Architecture

Deployment and Packaging

Application Infrastructure

Agile

Waterfall

DevOps

N-Tier

Monolithic

Microservices

Cloud

Containers

Physical Servers

Virtual Servers

Data Center

Hosted

Evolution to Cloud Native

λ

Functions

Serverless

Events

SRE

(Unikernels)

Third step in your cloud native journey

Meshery is interoperable with each abstraction.

Container

Orchestrator

Mesh

5.5 years
(June 2014)

4.5 years
(July 2015)

1.5 years
(July 2018)

5.5 years ago
(June 2014)

7 years ago
(March 2013)

2.5 years ago
(May 2017)

v1.0

Announced

Istio

2 years ago
(Dec 2017)

1.5 years ago
(Sept 2018)

Linkerd v2

3 years
(Apr 2017)

4 years ago
(Feb 2016)

Linkerd v1