Istio at Scale

large and small

Lee Calcote

May 2019

No service mesh

solarwinds.io

Service Mesh Architectures

Service Mesh Architecture

Data Plane

  • Touches every packet/request in the system.
  • Responsible for service discovery, health checking, routing, load balancing, authentication, authorization, and observability.

Ingress Gateway

Egress Gateway

Service Mesh Architecture

No control plane? Not a service mesh.

Egress Gateway

Control Plane

Data Plane

  • Touches every packet/request in the system.
  • Responsible for service discovery, health checking, routing, load balancing, authentication, authorization, and observability.
  • Provides policy, configuration, and platform integration.
  • Takes a set of isolated stateless sidecar proxies and turns them into a service mesh.
  • Does not touch any packets/requests in the data path.

Ingress Gateway

Service Mesh Architecture

Control Plane

Data Plane

  • Touches every packet/request in the system.
  • Responsible for service discovery, health checking, routing, load balancing, authentication, authorization, and observability.
  • Provides policy, configuration, and platform integration.
  • Takes a set of isolated stateless sidecar proxies and turns them into a service mesh.
  • Does not touch any packets/requests in the data path.

You need a management plane.

Ingress Gateway

Egress Gateway

Management
Plane

  • Provides monitoring, backend system integration, expanded policy and application configuration.

Pilot

Citadel

Mixer

Control Plane

Data Plane

istio-system namespace

policy check

Foo Pod

Proxy Sidecar

Service Foo

tls certs

discovery & config

Foo Container

Bar Pod

Proxy Sidecar

Service Bar

Bar Container

Out-of-band telemetry propagation

telemetry

 

reports

Control flow during request processing

application traffic

Application traffic

application namespace

telemetry reports

Istio Architecture

Galley

Ingress Gateway

Egress Gateway

Control Plane

Data Plane

linkerd-system namespace

Foo Pod

Proxy Sidecar

Service Foo

Foo Container

Bar Pod

Proxy Sidecar

Service Bar

Bar Container

Out-of-band telemetry propagation

telemetry

 

scarping

Control flow during request processing

application traffic

Application traffic

application namespace

telemetry scraping

Architecture

destination

Prometheus

Grafana

tap

web

CLI

proxy-api

public-api

Linkerd

proxy-injector

Service Mesh Deployment Models

Client

Edge Cache

Istio Gateway

(envoy)

Cache Generator

Collection of VMs running APIs

service mesh

Istio VirtualService

Istio VirtualService

Istio ServiceEntry

Situation:

  • existing services running on VMs (that have little to no service-to-service traffic).
  • nearly all traffic flows from client to the service and back to client.

 

Benefits:

  • gain granular traffic control (e.g path rewrites).
  • detailed service monitoring without immediately deploying a thousand sidecars.

Deployment Model: Ingress

Out-of-band telemetry propagation

Control flow during request processing

Application traffic

Proxy per Node

Service A

Service A

Service A

linkerd

Node (server)

Service A

Service A

Service B

linkerd

Node (server)

Service A

Service A

Service C

linkerd

Node (server)

Advantages:

  • Less (memory) overhead.

  • Simpler distribution of configuration information.

  • primarily physical or virtual server based; good for large monolithic applications.
     

Disadvantages:

  • Coarse support for encryption of service-to-service communication, instead host-to-host encryption and authentication policies.

  • Blast radius of a proxy failure includes all applications on the node, which is essentially equivalent to losing the node itself.

  • Not a transparent entity, services must be aware of its existence.

More deployment models

at

layer5.io/books

Advantages:

  • Good starting point for building a brand-new microservices architecture or for migrating from a monolith.

Disadvantages:

  • When the number of services increase, it becomes difficult to manage.

 

Router "Mesh"

Pilot+Mixer+mTLS+Ingress

solarwinds.io on Istio

Mixer

Mixer

Control Plane

Data Plane

istio-system namespace

Foo Pod

Proxy sidecar

Service Foo

Foo Container

Out-of-band telemetry propagation

Control flow during request processing

application traffic

application traffic

application namespace

telemetry reports

an attribute processing engine

AppOpticsâ„¢

  • Uses pluggable adapters to extend its functionality
    • Adapters run within the Mixer process
  • Adapters are modules that interface to infrastructure backends
    • (logging, metrics, quotas, etc.)
  • Multi-interface adapters are possible
    • (e.g., SolarWinds® adapter sends logs and metrics)

Mixer Adapters

types: logs, metrics, access control, quota

Papertrailâ„¢

Prometheusâ„¢

Stackdriverâ„¢

Open Policy Agent

Grafanaâ„¢

Fluentd

Statsd

®

Pilot

Citadel

Mixer

istio-system namespace

Galley

Control Plane

Meshery

a multi-service mesh management plane

 

https://layer5.io/meshery

Service Mesh Interface (SMI)

Load Generation

a multi-service mesh management plane

Service Mesh Interface (SMI)

Lab 4

@lcalcote

Meshery

layer5.io/books

Adopting a service mesh

layer5.io/landscape

Adopter’s Dilemma

Which service mesh to use?

What's the catch? Nothing's for free.

Playground

WHICH SERVICE MESH SHOULD I USE AND HOW DO I GET STARTED?

 

Learn about the functionality of different service meshes and visually manipulate mesh configuration.

Performance Benchmark

WHAT OVERHEAD DOES BEING ON THE SERVICE MESH INCUR?

 

Benchmark the performance of your application across different service meshes and compare their overhead.

layer5.io/meshery

@lcalcote

Side-by-Side Performance Comparison

Istio

Linkerd

Octarine

NSM

App Mesh

@lcalcote

results coming forthcoming at KubeCon EU...

Consul

Up next...

(service meshes contributing adapters)

Kubernetes
(no mesh)

Demo

Relative

Showdown!
Slowdown

@lcalcote

layer5.io/meshery

Cores Threads Istio (2) Linkerd
8 8 1 1
8 16 1.7 1.8
8 32 3.2 3.4
8 100 9.3 9.6

(2) mTLS on, tracing off

Relative

Showdown! Slowdown

@lcalcote

layer5.io/meshery

Cores Threads Istio (1) Istio (2) Linkerd
8 8 1 1 1
8 16 1.4 1.7 1.8
8 32 18.4 3.2 3.4
8 100 52.2 9.3 9.6

(1) mTLS on, tracing on

(2) mTLS on, tracing off

Mixer

Mixer

Control Plane

Data Plane

istio-system namespace

Foo Pod

Proxy sidecar

Service Foo

Foo Container

Out-of-band telemetry propagation

Control flow during request processing

application traffic

application traffic

application namespace

telemetry reports

an attribute processing engine

Different Response Sizes

@lcalcote

layer5.io/meshery

@lcalcote

layer5.io/meshery

Different Response Sizes

@lcalcote

layer5.io/meshery

Different Response Sizes

Service Mesh Benchmark Specification

A project and vendor-neutral specification for capturing details of:

  1. Environment / Infrastructure

    • Number and size of nodes, orchestrator

  2. Service mesh and its configuration

  3. Service / application details

Bundled with test results.

 

github.com/layer5io/service-mesh-benchmark-spec

@lcalcote

layer5.io/meshery

@lcalcote

layer5.io/meshery

Service Mesh Community

Layer5.io

layer5.io/books

What is a Service Mesh?

a dedicated layer for managing service-to-service communication

So, a microservices platform?

obviously.

Orchestrators don't bring all that you need

and neither do service meshes,

but they do get you closer.

Missing: application lifecycle management, but not by much

partially.

a services-first network

Missing: distributed debugging; provide nascent visibility (topology)

Benefits

First few services are relatively easy

 

 

Democratization of language and technology choice

 

Faster delivery, service teams running independently, rolling updates

Challenges

Next 10 or so may introduce pain

 

 

Language and framework-specific libraries

 

 

Distributed environments, ephemeral infrastructure, out-moded tooling

Microservices

Why use a Service Mesh?

to avoid...

  • Bloated service code

  • Duplicating work to make services production-ready

    • Load balancing, auto scaling, rate limiting, traffic routing...

  • Inconsistency across services

    • Retry, tls, failover, deadlines, cancellation, etc., for each language, framework

    • Siloed implementations lead to fragmented, non-uniform policy application and difficult debugging

  • Diffusing responsibility of service management

What do we need?

• Observability

• Logging
• Metrics
• Tracing

• Traffic Control

• Resiliency

• Efficiency
• Security

• Policy

a Service Mesh

Observability

what gets people hooked on service metrics

Goals

  • Metrics without instrumenting apps

  • Consistent metrics across fleet

  • Trace flow of requests across services

  • Portable across metric back-end providers

You get a metric!  You get a metric!  Everyone gets a metric!

© 2018 SolarWinds Worldwide, LLC. All rights reserved.

Traffic Control

control over chaos

  • Traffic splitting
    • L7 tag based routing?
  • Traffic steering
    • Look at the contents of a request and route it to a specific set of instances.
  • Ingress and egress routing

Resilency

 

  • Systematic fault injection
  • Timeouts and Retries with timeout budget

  • Control connection pool size and request load

  • Circuit breakers and Health checks

 

content-based traffic steering

Timeouts & Retries

Web

Service Foo

Timeout = 600ms
Retries = 3

Timeout = 300ms
Retries = 3

Timeout = 900ms
Retries = 3

Service Bar

Database

Timeout = 500ms
Retries = 3

Timeout = 300ms
Retries = 3

Timeout = 900ms
Retries = 3

Deadlines

Web

Service Foo

Deadline = 600ms

Deadline = 496ms

Service Bar

Database

Deadline = 428ms

Deadline=180ms

Elapsed=104ms

Elapsed=68ms

Elapsed=248ms

DEV

OPS

Layer 5

where Dev and Ops meet

Problem: too much infrastructure code in services