Rules

container_cpu_usage_is_high

35.755s ago

6.698ms

Rule State Error Last Evaluation Evaluation Time
alert: POD_CPU_IS_HIGH expr: sum by(container, pod, namespace) (rate(container_cpu_usage_seconds_total{container!=""}[5m])) * 100 > 90 for: 1m labels: severity: critical annotations: description: Container {{ $labels.container }} CPU usage inside POD {{ $labels.pod}} is high in {{ $labels.namespace}} summary: POD {{ $labels.pod}} CPU Usage is high in {{ $labels.namespace}} ok 35.757s ago 6.684ms

container_memory_usage_is_high

46.324s ago

22.03ms

Rule State Error Last Evaluation Evaluation Time
alert: POD_MEMORY_USAGE_IS_HIGH expr: (sum by(container, pod, namespace) (container_memory_working_set_bytes{container!=""}) / sum by(container, pod, namespace) (container_spec_memory_limit_bytes > 0) * 100) > 80 for: 1m labels: severity: critical annotations: description: |- Container Memory usage is above 80% VALUE = {{ $value }} LABELS = {{ $labels }} summary: Container {{ $labels.container }} Memory usage inside POD {{ $labels.pod}} is high in {{ $labels.namespace}} ok 46.325s ago 22.02ms

node_cpu_greater_than_80

21.015s ago

1.444ms

Rule State Error Last Evaluation Evaluation Time
alert: NODE_CPU_IS_HIGH expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90 for: 1m labels: severity: critical annotations: description: node {{ $labels.kubernetes_node }} cpu is high summary: node cpu is greater than 80 precent ok 21.015s ago 1.432ms

node_disk_space_too_low

16.137s ago

1.073ms

Rule State Error Last Evaluation Evaluation Time
alert: NODE_DISK_SPACE_IS_LOW expr: (100 * ((node_filesystem_avail_bytes{fstype!="rootfs",mountpoint="/"}) / (node_filesystem_size_bytes{fstype!="rootfs",mountpoint="/"}))) < 10 for: 1m labels: severity: critical annotations: description: node {{ $labels.node }} disk space is only {{ printf "%0.2f" $value }}% free. summary: node disk space remaining is less than 10 percent ok 16.137s ago 1.061ms

node_down

30.573s ago

514.7us

Rule State Error Last Evaluation Evaluation Time
alert: NODE_DOWN expr: up{component="node-exporter"} == 0 for: 3m labels: severity: warning annotations: description: '{{ $labels.job }} job failed to scrape instance {{ $labels.instance }} for more than 3 minutes. Node Seems to be down' summary: Node {{ $labels.kubernetes_node }} is down ok 30.573s ago 503.1us

node_memory_left_lessser_than_10

41.288s ago

1.067ms

Rule State Error Last Evaluation Evaluation Time
alert: NODE_MEMORY_LESS_THAN_10% expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10 for: 1m labels: severity: critical annotations: description: node {{ $labels.kubernetes_node }} memory left is low summary: node memory left is lesser than 10 precent ok 41.289s ago 1.055ms

Front50-cache

59.329s ago

363.5us

Rule State Error Last Evaluation Evaluation Time
alert: front50:storageServiceSupport:cacheAge__value expr: front50:storageServiceSupport:cacheAge__value > 300000 for: 2m labels: severity: warning annotations: description: front50 cacheAge for {{$labels.pod}} in namespace {{$labels.namespace}} has value = {{$value}} summary: front50 cacheAge too high ok 59.329s ago 350.9us

autopilot-component-jvm-errors

33.649s ago

2.758ms

Rule State Error Last Evaluation Evaluation Time
alert: jvm-memory-filling-up-for-oes-audit-client expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="auditclient"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="auditclient"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 33.649s ago 904.8us
alert: jvm-memory-filling-up-for-oes-autopilot expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="autopilot"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="autopilot"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 33.649s ago 557.4us
alert: jvm-memory-filling-up-for-oes-dashboard expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="dashboard"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="autopilot"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 33.648s ago 442.2us
alert: jvm-memory-filling-up-for-oes-platform expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="platform"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="platform"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 33.648s ago 356.9us
alert: jvm-memory-filling-up-for-oes-sapor expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="sapor"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="sapor"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 33.648s ago 208.3us
alert: jvm-memory-filling-up-for-oes-visibility expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="visibility"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="visibility"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 33.648s ago 264.5us

autopilot-component-latency-too-high

40.381s ago

3.169ms

Rule State Error Last Evaluation Evaluation Time
alert: oes-audit-client-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="auditclient"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="auditclient"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 40.381s ago 1.3ms
alert: oes-autopilot-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="autopilot"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="autopilot"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 40.38s ago 472.7us
alert: oes-dashboard-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="dashboard"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="dashboard"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 40.379s ago 223.1us
alert: oes-platform-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="platform"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="platform"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 40.379s ago 505.9us
alert: oes-sapor-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="sapor"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="sapor"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 40.379s ago 399us
alert: oes-visibility-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="visibility"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="visibility"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 40.378s ago 227us

autopilot-scrape-target-is-down

2s ago

1.421ms

Rule State Error Last Evaluation Evaluation Time
alert: oes-audit-client-scrape-target-is-down expr: up{component="auditclient"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-audit-client scrape target is down ok 2s ago 324.4us
alert: oes-autopilot-scrape-target-is-down expr: up{component="autopilot"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-autopilot scrape target is down ok 2s ago 103.4us
alert: oes-dashboard-scrape-target-is-down expr: up{component="dashboard"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-dashboard scrape target is down ok 1.999s ago 349.8us
alert: oes-platform-scrape-target-is-down expr: up{component="platform"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-platform scrape target is down ok 1.999s ago 418.3us
alert: oes-sapor-scrape-target-is-down expr: up{component="sapor"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-sapor scrape target is down ok 1.999s ago 108us
alert: oes-visibility-scrape-target-is-down expr: up{component="visibility"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-visibility scrape target is down ok 1.999s ago 71.82us

igor-needs-attention

3.135s ago

474.1us

Rule State Error Last Evaluation Evaluation Time
alert: igor-needs-attention expr: igor:pollingMonitor:itemsOverThreshold__value > 0 labels: severity: crtical annotations: description: Igor in namespace {{$labels.namespace}} needs human help summary: Igor needs attention ok 3.135s ago 460us

jvm-too-high

5.693s ago

2.012ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-rw-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_rw:jvm:memory:used__value) / sum by(instance, area) (clouddriver_rw:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-rw JVM memory too high ok 5.693s ago 543.9us
alert: clouddriver-ro-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_ro:jvm:memory:used__value) / sum by(instance, area) (clouddriver_ro:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-ro JVM memory too high ok 5.693s ago 221.6us
alert: clouddriver-caching-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_caching:jvm:memory:used__value) / sum by(instance, area) (clouddriver_caching:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-caching JVM memory too high ok 5.693s ago 113.8us
alert: gate-pod-may-be-evicted-soon expr: (sum by(instance, area) (gate:jvm:memory:used__value) / sum by(instance, area) (gate:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: gate JVM memory too high ok 5.693s ago 314us
alert: orca-pod-may-be-evicted-soon expr: (sum by(instance, area) (orca:jvm:gc:liveDataSize__value) / sum by(instance, area) (orca:jvm:gc:maxDataSize__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: orca JVM memory too high ok 5.693s ago 205.9us
alert: igor-pod-may-be-evicted-soon expr: (sum by(instance, area) (igor:jvm:memory:used__value) / sum by(instance, area) (igor:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: igor JVM memory too high ok 5.692s ago 197.1us
alert: echo-scheduler-pod-may-be-evicted-soon expr: (sum by(instance, area) (echo_scheduler:jvm:memory:used__value) / sum by(instance, area) (echo_scheduler:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: echo-scheduler JVM memory too high ok 5.692s ago 141.9us
alert: echo-worker-pod-may-be-evicted-soon expr: (sum by(instance, area) (echo_worker:jvm:memory:used__value) / sum by(instance, area) (echo_worker:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: echo-worker JVM memory too high ok 5.692s ago 116.9us
alert: front50-pod-may-be-evicted-soon expr: (sum by(instance, area) (front50:jvm:memory:used__value) / sum by(instance, area) (front50:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Front50 JVM memory too high ok 5.692s ago 130us

kube-api-server-is-down

33.112s ago

460.6us

Rule State Error Last Evaluation Evaluation Time
alert: kube-api-server-down expr: up{job="kubernetes-apiservers"} == 0 for: 2m labels: severity: critical annotations: description: Kubernetes API Server service went down LABELS = {{ $labels }} summary: Kube API Server job {{ $labels.job }} is down ok 33.112s ago 448us

kubernetes-api-server-experiencing-high-error-rate

14.011s ago

17.61ms

Rule State Error Last Evaluation Evaluation Time
alert: kube-api-server-errors expr: sum(rate(apiserver_request_total{code=~"^(?:5..)$",job="kubernetes-apiservers"}[2m])) / sum(rate(apiserver_request_total{job="kubernetes-apiservers"}[2m])) * 100 > 3 for: 2m labels: severity: critical annotations: description: |- Kubernetes API server is experiencing high error rate VALUE = {{ $value }} LABELS = {{ $labels }} summary: Kubernetes API server errors (instance {{ $labels.instance }}) ok 14.011s ago 17.6ms

latency-too-high

2.791s ago

3.008ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-ro-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro:controller:invocations__total{service="spin-clouddriver-ro"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro:controller:invocations__count_total{service="spin-clouddriver-ro"}[5m])) > 1 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 2.791s ago 500us
alert: clouddriver-rw-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_rw:controller:invocations__total{service="spin-clouddriver-rw"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_rw:controller:invocations__count_total{service="spin-clouddriver-rw"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is ({{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 2.79s ago 158.5us
alert: clouddriver-caching-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_caching:controller:invocations__total{service="spin-clouddriver-caching"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_caching:controller:invocations__count_total{service="spin-clouddriver-caching"}[5m])) > 5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 2.79s ago 128.7us
alert: clouddriver_ro_deck-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro_deck:controller:invocations__total{service="spin-clouddriver-ro-deck"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro_deck:controller:invocations__total{service="spin-clouddriver-ro-deck"}[5m])) > 5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 2.79s ago 128.3us
alert: gate-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(gate:controller:invocations__total{service="spin-gate"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(gate:controller:invocations__count_total{service="spin-gate"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 2.791s ago 128us
alert: orca-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(orca:controller:invocations__total{service="spin-orca"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(orca:controller:invocations__count_total{service="spin-orca"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 2.791s ago 106.1us
alert: igor-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(igor:controller:invocations__total{service="spin-igor"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(igor:controller:invocations__count_total{service="spin-igor"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 2.791s ago 485.3us
alert: echo_scheduler-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_scheduler:controller:invocations__total{service="spin-echo-scheduler"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_scheduler:controller:invocations__count_total{service="spin-echo-scheduler"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 2.79s ago 421.2us
alert: echo_worker-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_worker:controller:invocations__total{service="spin-echo-worker"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_worker:controller:invocations__count_total{service="spin-echo-worker"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 2.79s ago 186.3us
alert: front50-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(front50:controller:invocations__total{service="spin-front50"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(front50:controller:invocations__count_total{service="spin-front50"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 2.79s ago 123.5us
alert: fiat-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(fiat:controller:invocations__total{service="spin-fiat"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(fiat:controller:invocations__count_total{service="spin-fiat"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 2.79s ago 330.1us
alert: rosco-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(rosco:controller:invocations__total{service="spin-rosco"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(rosco:controller:invocations__count_total{service="spin-rosco"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 2.79s ago 278.6us

orca-queue-issue

58.861s ago

856.1us

Rule State Error Last Evaluation Evaluation Time
alert: orca-queue-depth-high expr: (sum by(instance) (orca:queue:ready:depth__value{namespace!=""})) > 10 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Orca queue depth is high ok 58.861s ago 535.1us
alert: orca-queue-lag-high expr: sum by(instance, service, namespace) (rate(orca:controller:invocations__total[2m])) / sum by(instance, service, namespace) (rate(orca:controller:invocations__count_total[2m])) > 0.5 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} has Lag value of {{$value}} summary: Orca queue lag is high ok 58.861s ago 305.4us

prometheus-job-down

40.103s ago

439.9us

Rule State Error Last Evaluation Evaluation Time
alert: prometheus-job-is-down expr: up{job="prometheus"} == 0 for: 5m labels: severity: warning annotations: description: Default Prometheus Job is Down LABELS = {{ $labels }} summary: The Default Prometheus Job is Down (job {{ $labels.job}}) ok 40.103s ago 420.8us

spinnaker-service-is-down

3.605s ago

1.977ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-rw-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-rw"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-rw Spinnaker service is down ok 3.605s ago 427.4us
alert: clouddriver-ro-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-ro"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-ro Spinnaker service is down ok 3.604s ago 127us
alert: clouddriver-caching-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-caching"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-caching Spinnaker service is down ok 3.604s ago 102.3us
alert: clouddriver-ro-deck-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-ro-deck"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-ro-deck Spinnaker service is down ok 3.604s ago 84.25us
alert: gate-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-gate"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Gate Spinnaker services is down ok 3.604s ago 71.42us
alert: orca-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-orca"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Orca Spinnaker service is down ok 3.604s ago 267.2us
alert: igor-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-igor"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Igor Spinnaker service is down ok 3.604s ago 186.8us
alert: echo-scheduler-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-echo-scheduler"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Echo-Scheduler Spinnaker service is down ok 3.604s ago 86.13us
alert: echo-worker-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-echo-worker"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Echo-worker Spinnaker service is down ok 3.604s ago 80.86us
alert: front50-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-front50"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Front50 Spinnaker service is down ok 3.604s ago 56.83us
alert: fiat-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-fiat"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Fiat Spinnaker service is down ok 3.604s ago 212.1us
alert: rosco-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-rosco"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Rosco Spinnaker service is down ok 3.604s ago 241.3us

volume-is-almost-full (< 10% left)

10.642s ago

2.07ms

Rule State Error Last Evaluation Evaluation Time
alert: pvc-storage-full expr: kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes * 100 < 10 for: 2m labels: severity: warning annotations: description: |- Volume is almost full (< 10% left) VALUE = {{ $value }} LABELS = {{ $labels }} summary: Kubernetes Volume running out of disk space for (persistentvolumeclaim {{ $labels.persistentvolumeclaim }} in namespace {{$labels.namespace}}) ok 10.642s ago 2.058ms