Skip to main content
This tutorial walks through the monitoring workflow: connect an environment, read the dashboard, spot anomalies early, and set the handful of controls that keep signal-to-noise high. Takes about fifteen minutes from a fresh workspace.

Prerequisites

A RubixKube account

Free tier works for this tutorial.

At least one environment connected

Kubernetes, AWS, GCP, or a Linux VM. Mix of any of them is fine.

Step 1: Read the Dashboard

The Dashboard is your daily check-in. Four tiles tell you whether the system is worth investigating right now.
TileWhat it meansWhen to pay attention
System HealthBlended score across every connected environmentBelow 95% or trending down
Active InsightsCount of anomalies RubixKube is watchingAny new insight since your last visit
Intelligent AnalysisRCA reports ready to readNew report since last session
AgentsHealth of the observer and cloud-side agentsAnything other than all green
Open the Dashboard once a day even when nothing is on fire. Most anomalies show up as a subtle drift, not an outage.

Step 2: Check your topology

Open Infrastructure Topology. You should see every resource the Observer has discovered, grouped by environment.
  • Green edges, dependencies behaving as expected.
  • Yellow edges, degraded signals (higher latency, elevated error rate, resource pressure).
  • Red edges, active incidents.
Clicking any node shows the last hour of signals for that resource plus any insights currently attached to it. Good starting point for deep dives.

Step 3: Tune Insights to your team

Open Magic Insights. Each insight is an anomaly the system thinks a human should know about.
1

Filter to your services

Use the environment and namespace filters to narrow to what your team owns. Bookmark the view.
2

Set the severity threshold

Start at Medium. Too many low-severity cards train people to ignore the list.
3

Subscribe to the ones that matter

Each insight has a Follow action. Followed insights post to your notification channel when their status changes.

Step 4: Connect a notification channel

Health monitoring is only useful if the right person sees the signal. Connect a channel you already live in.

Slack

Channel-level routing for insights and RCAs.

Microsoft Teams

Team-channel delivery for insights and RCAs.

PagerDuty

Promote critical insights into on-call pages.

Linear

Turn RCAs into tickets with one click.

Step 5: Ask Chat a monitoring question

Chat is the fastest way to pull a specific view without learning a query language. A few prompts worth bookmarking:
What changed in the payments service in the last hour?
Which resources have the highest error rate today?
Any hosts above 80% memory right now?
Show me deployments that rolled back in the last 24 hours.
Answers come with cited evidence, so you can jump from the reply straight to the underlying events.

What healthy monitoring looks like after a week

System Health is stable

Sits between 95 and 100% most of the time. Dips correlate to known events.

Insights have owners

Your team either acts, dismisses, or routes every new insight. Few stale cards older than a day.

Topology reflects reality

Newly deployed services appear. Decommissioned resources drop out within the hour.

Chat answers with evidence

Your team uses Chat instead of grepping logs for quick questions.

Common questions

Most anomalies surface within one to two minutes of the underlying signal. The OPEL loop runs continuously rather than on a fixed cron schedule.
An Insight is an anomaly worth attention. An RCA Report is a full causal chain with evidence and recommended fixes. Not every insight becomes an RCA, only the ones that look like they have a single identifiable root cause.
Yes. Every connected environment feeds into the same Dashboard, Insights list, and knowledge graph. Use the environment filter to narrow to a single one when needed.
Connect Slack or Teams, filter Insights to the severity you care about, and subscribe. The dashboard becomes optional.

How to Automate Incident Remediation

The next step after monitoring: when an insight becomes an incident.

Talk to your infra

Go deeper with Chat for on-demand investigations.