Health

Health

Colors in the graph represent the health of your service mesh. A node colored red or orange might need attention. The color of an edge between components represents the health of the requests between those components. The node shape indicates the type of component such as services, workloads, or apps.

The health of nodes and edges is refreshed automatically based on the user’s preference. The graph can also be paused to examine a particular state, or replayed to re-examine a particular time period.

Visualize the health of your mesh

Health Configuration

Kiali calculates health by combining the individual health of several indicators, such as pods and request traffic. The global health of a resource reflects the most severe health of its indicators.

Health Indicators

The table below lists the current health indicators and whether the indicator supports custom configuration for its health calculation.

Indicator Supports Configuration
Pod Status No
Traffic Health Yes

Icons and colors

Kiali use icons and colors to indicate the health of resources and associated request traffic.

  • No Health Information (NA)
  • Healthy
  • Degraded
  • Failure

Default Values

Request Traffic

By default Kiali uses the traffic rate configuration shown below. Application errors have minimal tolerance while client errors have a higher tolerance reflecting that some level of client errors is often normal (e.g. 404 Not Found):

  • For http protocol 4xx are client errors and 5xx codes are application errors.
  • For grpc protocol all 1-16 are errors (0 is success).

So, for example, if the rate of application errors is >= 0.1% kiali will show Degraded health and if > 10% will show Failure health.

# ...
  health_config:
    rate:
      - namespace: ".*"
        kind: ".*"
        name: ".*"
        tolerance:
          - code: "^5\\d\\d$"
            direction: ".*"
            protocol: "http"
            degraded: 0
            failure: 10
          - code: "^4\\d\\d$"
            direction: ".*"
            protocol: "http"
            degraded: 10
            failure: 20
          - code: "^[1-9]$|^1[0-6]$"
            direction: ".*"
            protocol: "grpc"
            degraded: 0
            failure: 10
# ...

Configuration

Custom health configuration is specified in the Kiali CR. To see the supported configuration syntax for health_config visit Kiali CR.

Kiali applies the first matching rate configuration (namespace, kind, etc) and calculates the status for each tolerance. The reported health will be the status with highest priority (see below).

Rate OptionDefinitionDefault
namespaceMatching Namespaces (regex).* (match all)
kindMatching Resource Types (workload|app|service) (regex).* (match all)
nameMatching Resource Names (regex).* (match all)
ToleranceArray of tolerances to apply.
Tolerance Option Definition Default
code Matching Response Status Codes (regex) [1] required
direction Matching Request Directions (inbound|outbound) (regex) .* (match all)
protocol Matching Request Protocols (http|grpc) (regex) .* (match all)
degraded Degraded Threshold(% matching requests >= value) 0
failure Failure Threshold (% matching requests >= value) 0

[1] The status code typically depends on the request protocol. The special code -, a single dash, is used for requests that don’t receive a response, and therefore no response code.

Kiali reports traffic health with the following top-down status priority :

Priority Rule (value=% matching requests) Status
1 value >= FAILURE threshold FAILURE
2 value >= DEGRADED threshold AND value < FAILURE threshold DEGRADED
3 value > 0 AND value < DEGRADED threshold HEALTHY
4 value = 0 HEALTHY
5 No traffic No Health Information

Examples

These examples use the repo _https://github.com/kiali/demos/tree/master/error-rates_.

In this repo we can see 2 namespaces: alpha and beta (Demo design).

Alpha

Where nodes return the responses (You can configure responses here):

App (alpha/beta) Code Rate
x-server 200 9
x-server 404 1
y-server 200 9
y-server 500 1
z-server 200 8
z-server 201 1
z-server 201 1

The applied traffic rate configuration is:
# ...
health_config:
  rate:
   - namespace: "alpha"
     tolerance:
       - code: "404"
         failure: 10
         protocol: "http"
       - code: "[45]\\d[^\\D4]"
         protocol: "http"
   - namespace: "beta"
     tolerance:
       - code: "[4]\\d\\d"
         degraded: 30
         failure: 40
         protocol: "http"
       - code: "[5]\\d\\d"
         protocol: "http"
# ...

After Kiali adds default configuration we have the following (Debug Info Kiali):

{
  "healthConfig": {
    "rate": [
      {
        "namespace": "/alpha/",
        "kind": "/.*/",
        "name": "/.*/",
        "tolerance": [
          {
            "code": "/404/",
            "degraded": 0,
            "failure": 10,
            "protocol": "/http/",
            "direction": "/.*/"
          },
          {
            "code": "/[45]\\d[^\\D4]/",
            "degraded": 0,
            "failure": 0,
            "protocol": "/http/",
            "direction": "/.*/"
          }
        ]
      },
      {
        "namespace": "/beta/",
        "kind": "/.*/",
        "name": "/.*/",
        "tolerance": [
          {
            "code": "/[4]\\d\\d/",
            "degraded": 30,
            "failure": 40,
            "protocol": "/http/",
            "direction": "/.*/"
          },
          {
            "code": "/[5]\\d\\d/",
            "degraded": 0,
            "failure": 0,
            "protocol": "/http/",
            "direction": "/.*/"
          }
        ]
      },
      {
        "namespace": "/.*/",
        "kind": "/.*/",
        "name": "/.*/",
        "tolerance": [
          {
            "code": "/^5\\d\\d$/",
            "degraded": 0,
            "failure": 10,
            "protocol": "/http/",
            "direction": "/.*/"
          },
          {
            "code": "/^4\\d\\d$/",
            "degraded": 10,
            "failure": 20,
            "protocol": "/http/",
            "direction": "/.*/"
          },
          {
            "code": "/^[1-9]$|^1[0-6]$/",
            "degraded": 0,
            "failure": 10,
            "protocol": "/grpc/",
            "direction": "/.*/"
          }
        ]
      }
    ]
  }
}

What are we applying?

  • For namespace alpha, all resources

  • Protocol http if % requests with error code 404 are >= 10 then FAILURE, if they are > 0 then DEGRADED

  • Protocol http if % requests with others error codes are> 0 then FAILURE.

  • For namespace beta, all resources

  • Protocol http if % requests with error code 4xx are >= 40 then FAILURE, if they are >= 30 then DEGRADED

  • Protocol http if % requests with error code 5xx are > 0 then FAILURE

  • For other namespaces kiali apply defaults.

  • Protocol http if % requests with error code 5xx are >= 20 then FAILURE, if they are >= 0.1 then DEGRADED

  • Protocol grpc if % requests with error code match /^[1-9]$|^1[0-6]$/ are >= 20 then FAILURE, if they are >= 0.1 then DEGRADED

Alpha Beta
Last modified October 13, 2021 : Cleanup (#427) (5c4806e9)