Service Meshes, But at What Cost?

Lee Calcote

April 2019

Envoy, Linkerd, Nginx, Conduit

Swapping Proxies

The subject of my talk from Container World 2018.

Proxy swapping is no longer a consideration in Istio in 2019.

Service Proxy Sidecar

- A C++ based L4/L7 proxy

- Low memory footprint

- In production at Lyft™

Capabilities:

  • API driven config updates → no reloads
  • Zone-aware load balancing w/ failover
  • Traffic routing and splitting
  • Health checks, circuit breakers, timeouts, retry budgets, fault injection…
  • HTTP/2 & gRPC
  • Transparent proxying
  • Designed for observability

 

the included battery

Data Plane

Pod

Proxy sidecar

App Container

Envoy has lived up to its tagline of the universal data plane API.

clouds, containers, functions, applications,  and their management

Freely Available

at

Microservices

The more, the merrier?

Benefits

First few services are relatively easy

 

 

Democratization of language and technology choice

 

Faster delivery, service teams running independently, rolling updates

Challenges

Next 10 or so may introduce pain

 

 

Language and framework-specific libraries

 

 

Distributed environments, ephemeral infrastructure, out-moded tooling

Microservices

Which is why...

 I have a container orchestrator.

Core

Capabilities

  • Cluster Management

    • Host Discovery

    • Host Health Monitoring

  • Scheduling

  • Orchestrator Updates and Host Maintenance

  • Service Discovery

  • Networking and Load Balancing

  • Stateful Services

  • Multi-Tenant, Multi-Region

Additional

Key Capabilities

  • Application Health and Performance Monitoring

  • Application Deployments

  • Application Secrets

minimal capabilities required to qualify as a container orchestrator

Service meshes generally rely on these underlying layers.

Which is why...

 I have an API gateway.

Microservices API Gateways

north-south vs. east-west

What do we need?

• Observability

• Logging
• Metrics
• Tracing

• Traffic Control

• Resiliency

• Efficiency
• Security

Policy

a Service Mesh

What is a Service Mesh?

a dedicated layer for managing service-to-service communication

So, a microservices platform?

obviously.

Orchestrators don't bring all that you need

and neither do service meshes,

but they do get you closer.

Missing: application lifecycle management, but not by much

partially.

a services-first network

Missing: distributed debugging; provide nascent visibility (topology)

BookInfo Sample App

Reviews v1

Reviews Pod

Reviews v2

Reviews v3

Product Pod

Details Container

Details Pod

Ratings Container

Ratings Pod

Product Container

Reviews Service

Ratings Service

Details Service

Product Service

BookInfo Sample App on Service Mesh

Reviews v1

Reviews Pod

Reviews v2

Reviews v3

Product Pod

Details Container

Details Pod

Ratings Container

Ratings Pod

Product Container

Envoy sidecar

Envoy sidecar

Envoy sidecar

Envoy sidecar

Envoy sidecar

Reviews Service

Enovy sidecar

Envoy ingress

Product Service

Ratings Service

Details Service

Service Mesh Architecture

Control Plane

Data Plane

  • Touches every packet/request in the system.
  • Responsible for service discovery, health checking, routing, load balancing, authentication, authorization, and observability.
  • Provides policy and configuration for services in the mesh.
  • Takes a set of isolated stateless sidecar proxies and turns them into a service mesh.
  • Does not touch any packets/requests in the system.

No control plane? Not a service mesh.

Ingress Gateway

Egress Gateway

Backend Systems

Why use a Service Mesh?

to avoid...

  • Bloated service code

  • Duplicating work to make services production-ready

    • Load balancing, auto scaling, rate limiting, traffic routing...

  • Inconsistency across services

    • Retry, tls, failover, deadlines, cancellation, etc., for each language, framework

    • Siloed implementations lead to fragmented, non-uniform policy application and difficult debugging

  • Diffusing responsibility of service management

DEV

OPS

Layer 5

where Dev and Ops meet

Problem: too much infrastructure code in services

Observability

what gets people hooked on service metrics

Goals

  • Metrics without instrumenting apps

  • Consistent metrics across fleet

  • Trace flow of requests across services

  • Portable across metric back-end providers

You get a metric!  You get a metric!  Everyone gets a metric!

© 2018 SolarWinds Worldwide, LLC. All rights reserved.

Traffic Control

control over chaos

  • Traffic splitting
    • L7 tag based routing?
  • Traffic steering
    • Look at the contents of a request and route it to a specific set of instances.
  • Ingress and egress routing

Resilency

 

  • Systematic fault injection
  • Timeouts and Retries with timeout budget

  • Control connection pool size and request load

  • Circuit breakers and Health checks

 

content-based traffic steering

Timeouts & Retries

Web

Service Foo

Timeout = 600ms
Retries = 3

Timeout = 300ms
Retries = 3

Timeout = 900ms
Retries = 3

Service Bar

Database

Timeout = 500ms
Retries = 3

Timeout = 300ms
Retries = 3

Timeout = 900ms
Retries = 3

Deadlines

Web

Service Foo

Deadline = 600ms

Deadline = 496ms

Service Bar

Database

Deadline = 428ms

Deadline=180ms

Elapsed=104ms

Elapsed=68ms

Elapsed=248ms

What is Istio?

an open platform to connect, manage, and secure microservices

  • Observability

  • Resiliency

  • Traffic Control

  • Security

  • Policy Enforcement

@IstioMesh

Pilot

Citadel

Mixer

Control Plane

Data Plane

istio-system namespace

policy check

Foo Pod

Proxy Sidecar

Service Foo

tls certs

discovery & config

Foo Container

Bar Pod

Proxy Sidecar

Service Bar

Bar Container

Out-of-band telemetry propagation

telemetry

 

reports

Control flow during request processing

application traffic

Application traffic

application namespace

telemetry reports

Istio Architecture

Galley

Control Plane

Data Plane

linkerd-system namespace

Foo Pod

Proxy Sidecar

Service Foo

Foo Container

Bar Pod

Proxy Sidecar

Service Bar

Bar Container

Out-of-band telemetry propagation

telemetry

 

scarping

Control flow during request processing

application traffic

Application traffic

application namespace

telemetry scraping

Architecture

destination

Prometheus

Grafana

tap

dashboard

CLI

proxy-api

public-api

layer5.io/landscape

Adopting a service mesh

Adopter’s Dilemma

Which service mesh to use?

What's the catch? Nothing's for free.

Playground

WHICH SERVICE MESH SHOULD I USE AND HOW DO I GET STARTED?

 

Learn about the functionality of different service meshes and visually manipulate mesh configuration.

Performance Benchmark

WHAT OVERHEAD DOES BEING ON THE SERVICE MESH INCUR?

 

Benchmark the performance of your application across different service meshes and compare their overhead.

layer5.io/meshery

@lcalcote

Meshery

a multi-service mesh performance benchmark and playground

Demo

 

Deployment

  • Deployment of Meshery and sample app

Configuration

  • Cluster, adapters and grafana

  • Configuration validation using Istio Vet

Performance tests

  • View individual test result

  • Compare multiple tests (two)

  • Compare multiple tests (many)

  • Benchmark Specification

 

Side-by-Side Performance Comparison

Istio

Linkerd

Consul

Octarine

App Mesh?

Results coming…

     Upcoming presentations:

- DockerCon

- KubeCon EU

@lcalcote

layer5.io/meshery

Service Mesh Benchmark Specification

A project and vendor-neutral specification for capturing details of:

  1. Environment / Infrastructure

    • Number and size of nodes, orchestrator

  2. Service mesh and its configuration

  3. Service / application details

Bundled with test results.

 

github.com/layer5io/service-mesh-benchmark-spec

@lcalcote

layer5.io/meshery

layer5.io/books

Subscribe for Early Release

at

https://layer5.io/subscribe

Lee Calcote

Thank you. Questions?

clouds, containers, functions,

applications and their management