How to monitor infrastructure health with RubixKube

This tutorial walks through the monitoring workflow: connect an environment, read the dashboard, spot anomalies early, and set the handful of controls that keep signal-to-noise high. Takes about fifteen minutes from a fresh workspace.

Prerequisites

A RubixKube account

Free tier works for this tutorial.

At least one environment connected

Kubernetes, AWS, GCP, or a Linux VM. Mix of any of them is fine.

Step 1: Read the Dashboard

The Dashboard is your daily check-in. Four tiles tell you whether the system is worth investigating right now.

Tile	What it means	When to pay attention
System Health	Blended score across every connected environment	Below 95% or trending down
Active Insights	Count of anomalies RubixKube is watching	Any new insight since your last visit
Intelligent Analysis	RCA reports ready to read	New report since last session
Agents	Health of the observer and cloud-side agents	Anything other than all green

Open the Dashboard once a day even when nothing is on fire. Most anomalies show up as a subtle drift, not an outage.

Step 2: Check your topology

Open Infrastructure Topology. You should see every resource the Observer has discovered, grouped by environment.

Green edges, dependencies behaving as expected.
Yellow edges, degraded signals (higher latency, elevated error rate, resource pressure).
Red edges, active incidents.

Clicking any node shows the last hour of signals for that resource plus any insights currently attached to it. Good starting point for deep dives.

Step 3: Tune Insights to your team

Open Magic Insights. Each insight is an anomaly the system thinks a human should know about.

Filter to your services

Use the environment and namespace filters to narrow to what your team owns. Bookmark the view.

Set the severity threshold

Start at Medium. Too many low-severity cards train people to ignore the list.

Subscribe to the ones that matter

Each insight has a Follow action. Followed insights post to your notification channel when their status changes.

Step 4: Connect a notification channel

Health monitoring is only useful if the right person sees the signal. Connect a channel you already live in.

Slack

Channel-level routing for insights and RCAs.

Microsoft Teams

Team-channel delivery for insights and RCAs.

PagerDuty

Promote critical insights into on-call pages.

Linear

Turn RCAs into tickets with one click.

Step 5: Ask Chat a monitoring question

Chat is the fastest way to pull a specific view without learning a query language. A few prompts worth bookmarking:

What changed in the payments service in the last hour?
Which resources have the highest error rate today?
Any hosts above 80% memory right now?
Show me deployments that rolled back in the last 24 hours.

Answers come with cited evidence, so you can jump from the reply straight to the underlying events.

What healthy monitoring looks like after a week

System Health is stable

Sits between 95 and 100% most of the time. Dips correlate to known events.

Insights have owners

Your team either acts, dismisses, or routes every new insight. Few stale cards older than a day.

Topology reflects reality

Newly deployed services appear. Decommissioned resources drop out within the hour.

Chat answers with evidence

Your team uses Chat instead of grepping logs for quick questions.

Common questions

How fast does RubixKube detect a new issue?

Most anomalies surface within one to two minutes of the underlying signal. The OPEL loop runs continuously rather than on a fixed cron schedule.

What is the difference between an Insight and an RCA Report?

An Insight is an anomaly worth attention. An RCA Report is a full causal chain with evidence and recommended fixes. Not every insight becomes an RCA, only the ones that look like they have a single identifiable root cause.

Can I monitor multiple environments at once?

Yes. Every connected environment feeds into the same Dashboard, Insights list, and knowledge graph. Use the environment filter to narrow to a single one when needed.

What if I only want alerts, not a dashboard?

Connect Slack or Teams, filter Insights to the severity you care about, and subscribe. The dashboard becomes optional.

How to Automate Incident Remediation

The next step after monitoring: when an insight becomes an incident.

Talk to your infra

Go deeper with Chat for on-demand investigations.

​Prerequisites

A RubixKube account

At least one environment connected

​Step 1: Read the Dashboard

​Step 2: Check your topology

​Step 3: Tune Insights to your team

Filter to your services

Set the severity threshold

Subscribe to the ones that matter

​Step 4: Connect a notification channel

Slack

Microsoft Teams

PagerDuty

Linear

​Step 5: Ask Chat a monitoring question

​What healthy monitoring looks like after a week

System Health is stable

Insights have owners

Topology reflects reality

Chat answers with evidence

​Common questions

​Related guides

How to Automate Incident Remediation

Talk to your infra

Prerequisites

Step 1: Read the Dashboard

Step 2: Check your topology

Step 3: Tune Insights to your team

Step 4: Connect a notification channel

Step 5: Ask Chat a monitoring question

What healthy monitoring looks like after a week

Common questions

Related guides