Using the Dashboard: Complete Guide
The Dashboard is your central command center - providing real-time system health, active incidents, agent status, infrastructure overview, and live activity streams all in one view.Dashboard Overview

- Title: “Dashboard” with description “System overview and real-time monitoring”
- Updated timestamp: Shows when data was last refreshed (e.g., “Updated 16:24:46”)
- Refresh all data button: Click to manually refresh all dashboard metrics
The 4 Key Metrics at a Glance

1. System Health: 100%
What It Means
- Pod health status
- Resource utilization
- Incident history
- Agent connectivity
How It's Calculated
- 100% = All pods healthy, no active incidents
- 95-99% = Minor issues detected
- 85-94% = Multiple incidents
- Below 85% = Critical issues
2. Active Insights: 3
What it shows: Number of issues currently requiring your attention Visual indicators: - Large number “3” shows total insights- Badge “4” indicates newer/updated insights
- Orange icon for warnings
- Includes: ImagePullBackOff, CrashLoopBackOff, OOMKilled
- medium severity - Pod crashes, image pull errors
- low severity - Configuration warnings
3. Intelligent Analysis: 3 RCA Reports
What it shows: Number of Root Cause Analysis reports generated by the RCA Pipeline Agent Visual indicators: - Number “3” shows completed RCA reports- Shield icon indicates intelligent analysis
- Orange left border for attention
- Shows analysis is done, not just detection
- It ANALYZED them and found root causes
- Complete reports available with evidence
4. Agents: 3/4 Active
What it shows: How many AI agents are running / total agents From our dashboard: “3/4 Active”- 3 agents operational
- 1 agent offline or degraded
Infrastructure Panel

- Show details button - Expand to see additional metrics
- View page link - Go to full Infrastructure topology
Infrastructure Metrics Overview

Summary metrics:
| Metric | Our Cluster | What It Means |
|---|---|---|
| Nodes | 1 | Physical/virtual machines |
| Pods | 14 | Running containers (including our 3 failing + healthy pods) |
| Deployments | 4 | Managed application sets |
| Services | 3 | Network endpoints |
- Cube icon for Pods
- Grid icon for Deployments
- Globe icon for Services
Expanded Infrastructure View

Click “Show details” to reveal:
| Additional Metric | Value | What It Means |
|---|---|---|
| NAMESPACES | 7 | Kubernetes namespaces in cluster |
| AGENTS | 1 | RubixKube agent pods deployed |
- Two additional metrics appear at the bottom
- All resource counts remain visible
Activity Feed: Real-Time Event Stream

From our real dashboard - 7 events showing:
Event Types You’ll See
- INSIGHT Events
- RCA Events
- HIGH Severity
Example from our dashboard:
“Pod broken-image-demo has been pending for an extended period”- Type: INSIGHT (blue badge)
- Severity: medium
- Status: active (white badge)
- Time: 10/4/2025 01:59 AM
“Container experiencing repeated crashes in crash-loop-demo (restart count: 3)”
- Type: INSIGHT
- Severity: medium
- Status: active
Understanding Event Details
Each event card shows: Top row: - ** Top row:** - Indicates event type (warning triangle, magnifying glass, etc.)- Title - Clear description of what happened
- Badge - INSIGHT or RCA tag
- Status badge - “active” or “resolved”
- Purple left border = RCA events
- Red “high” text = Critical severity
RCA Live Stream

- Instruction: “Connect to SSE to see live events”
- Analysis percentage (e.g., “60% complete”)
- Evidence being gathered
- Pattern matching in progress
- Real-time streaming of RCA analysis steps
- Offline - Not connected to event stream
- Shows event count when active analysis is running
Dashboard Workflow: Your Daily Routine
Morning Check (60 seconds)
Open Dashboard
Check System Health
Review Active Insights
Scan Activity Feed
- HIGH severity items (red text)
- New incidents since yesterday
- RCA reports completed (purple RCA badge)
Verify All Agents Active
Review Infrastructure
- Are pod counts normal?
- Any unexpected changes in deployments?
- Click “Show details” for namespace overview
Real Examples from Our Dashboard
Example 1: OOMKilled Detected (HIGH Severity)
What appeared in Activity Feed:
“Out of memory (OOMKilled) detected on a pod in Pod/memory-hog-demo”What to do: 1. Note it’s HIGH severity (needs immediate attention) 2. Look for corresponding RCA report (should appear shortly after) 3. Click on the event for full details 4. Goes to Insights page with complete analysis 5. Review root cause and recommended fixes 6. Apply memory limits or increase resources Expected RCA: Should see a follow-up event like:
- Type: INSIGHT
- Severity : high(red text)
- Status: active
- Time: 01:58 AM
“Postmortem: OOMKilled Event on Pod ‘memory-hog-demo‘“
Example 2: CrashLoopBackOff with RCA
What appeared:
1.First - INSIGHT event: > “Container experiencing repeated crashes in crash-loop-demo (restart count: 3)”2.Then - RCA event: > “Postmortem: Pod ‘crash-loop-demo’ in CrashLoopBackOff State”
- Type: INSIGHT
- Severity: medium
- Status: active
What this pattern means: - RubixKube detected the crash loop (INSIGHT)
- Type: RCA
- Status: RCA Report Analysis Complete
- RCA Pipeline automatically analyzed it (RCA)
- Complete report now available with root cause
Example 3: ImagePullBackOff
What appeared:
“Pod broken-image-demo has been pending for an extended period”What this means: - Pod stuck in Pending state
- Type: INSIGHT
- Severity: medium
- Status: active
- Likely ImagePullBackOff error
- Not critical but needs fixing
- Verify registry credentials
- Confirm image exists in registry
Understanding the Metrics
System Health Calculation
How RubixKube calculates the 100%:
Pod Health Check
Incident Weight
Agent Connectivity
Resource Health
- Node status (all nodes ready)
- Critical system pods (kube-system namespace)
- RubixKube components
Final Score
- Core system pods are all healthy
- System infrastructure is operational
- Agents are mostly active (3/4)
Infrastructure Panel Deep Dive
What each metric means:
Nodes: 1
Our cluster: Single-node KIND cluster In production: Would show multiple nodes (e.g., “5 nodes”) What to watch: - Node count changes (scale up/down)- All nodes should be Ready status
- Node resource pressure (memory, CPU, disk)
- Node details
- Resource usage per node
- Taints and tolerations
- Node conditions
Pods: 14
Our cluster breakdown: - 8 system pods (kube-system namespace)- 3 RubixKube pods (rubixkube-system)
- 3 failing tutorial pods (tutorial namespace)
- Yellow = Some issues (warning)
- Red = Critical failures
- Unusual increases (unexpected deployments)
- Pods stuck in Pending
Deployments: 4
Our cluster: - coredns (DNS service)- local-path-provisioner (storage)
- kubernetes-mcp-server (API)
- rubixkube-observer (monitoring agent)
- Rollout status
- Pod template health
- Deployment health
- Replica counts (desired/current/ready)
- Rollout history
- Pod template specs
Services: 3
Our cluster: - kubernetes (API server)- kube-dns (DNS)
- rubixkube-observer (monitoring)
- NodePort - External access via node IP
- LoadBalancer - Cloud load balancer
- Port configurations
- Service discovery issues
Namespaces: 7 (Expanded View)
Click “Show details” to see this metric
Our cluster namespaces: 1. default 2. kube-system (system pods) 3. kube-public 4. kube-node-lease 5. rubixkube-system (RubixKube components) 6. tutorial (demo failing pods) 7. local-path-storage Why this matters: - Understand cluster organization- Identify namespace-specific issues
- Track namespace growth
Agents: 1 (Expanded View)
Click “Show details” to see this metric
Shows: Number of RubixKube agent pods deployed Our cluster: 1 agent pod (rubixkube-observer) Cross-reference: Compare with “Agents: 3/4” metric- ** 3/4 Active** = 3 of 4 agent types are running
- ** 1 Agent pod** = 1 physical pod deployed (may run multiple services)
Activity Feed Explained
From our dashboard - 7 events in chronological order (newest first):
1.Pod broken-image-demo pending (INSIGHT, medium, 01:59 AM) 2.crash-loop-demo crashes (INSIGHT, medium, 01:58 AM) 3.crash-loop-demo RCA complete (RCA, 01:58 AM) 4.memory-hog-demo crashes (INSIGHT, medium, 01:58 AM) 5.memory-hog-demo RCA complete (RCA, 01:58 AM) 6.OOMKilled on memory-hog-demo (INSIGHT, HIGH, 01:58 AM) ⭐ Most critical 7.OOMKilled RCA complete (RCA, 01:58 AM) Notice the pattern: - INSIGHT first (detection by Observer)- RCA follows within seconds (analysis by RCA Pipeline)
- Chronological order helps track incident progression
- Persistent scrollback - Historical events remain
- Badge indicators - Easy visual scanning
- Click to expand - View full event details
Dashboard Actions You Can Take
1. Refresh All Data
Button: Top right “Refresh all data” (circular arrow icon) What it does: - Re-queries all metrics- Updates infrastructure counts
- Refreshes activity feed
- Updates agent status
- Reloads RCA stream connection
- After making cluster changes
- When troubleshooting stale data
2. Refresh Infrastructure Only
Button: Infrastructure panel “Refresh data” button What it does: - Updates only Infrastructure panel metrics- Faster than full dashboard refresh
- Doesn’t reload Activity Feed or metrics
- Scaling deployments
- Adding/removing nodes
3. Expand Infrastructure Details
Button: “Show details” / “Hide details” What it reveals: - Namespaces count- Agents count
- Keeps existing metrics visible
- Investigating namespace issues
- Verifying agent deployment
4. Navigate to Details
From metrics cards: - ** From metrics cards:** → Go to Insights page- Click “Agents” card → Go to Agents page
- Number indicators are clickable
- Click individual metrics → Filtered views (when on Infrastructure page)
- Click INSIGHT badge → View insight details
- Click RCA badge → View RCA report
Common Dashboard Scenarios
Scenario 1: System Health Drops to 92%
What you see: - Health metric turns yellow/orange- Active Insights count increased
- New events in Activity Feed
Identify New Insights
Check Severity
Prioritize
Investigate
Take Action
Scenario 2: Active Insights Jumps from 0 to 3
What you see:- Insights card shows “3” with badge “4”
- Multiple new events in Activity Feed
- Timestamps are recent (last few minutes)
Don't Panic
Scan Feed
Check for Patterns
Wait for RCA
Review Analysis
Scenario 3: Agent Goes Offline (shows 2/3)
What you see:- Agents metric shows “2/3” or “3/4”
- May see “Agent health degraded” in Activity Feed
- System Health may decrease slightly
Navigate to Agents
Identify Offline Agent
Check Agent Pod
Review Logs
Restart Agent
kubectl rollout restart deployment/[agent] -n rubixkube-systemVerify Recovery
Scenario 4: RCA Live Stream Shows Activity
What you see:- RCA Live Stream panel shows “LIVE” with event count (e.g., “3 events”)
- Events stream in real-time
- May see function calls, analysis steps
Watch Analysis
Wait for Completion
Check Activity Feed
Review Findings
Scenario 5: Infrastructure Metrics Change Unexpectedly
What you see:- Pod count increased from 14 to 18
- No corresponding Activity Feed events
- System Health still 100%
Click View Page
Check New Pods
Identify Namespace
Review Deployments
Verify Expected
Best Practices for Dashboard Usage
1. Start Every Day Here
1. Start Every Day Here
Open Dashboard
60-Second Scan
Note Changes
Drill Down
2. Keep Dashboard Open
2. Keep Dashboard Open
- Activity Feed refreshes automatically
- Spot new incidents immediately as they occur
- No manual refresh needed
3. Learn Your Baseline
3. Learn Your Baseline
Track normal patterns over time:
System Health: - ** Normal range**: 95-100%- Typical: 100% for healthy clusters
- Alert threshold: Below 95%
- Alert threshold: 3+ or sudden increase
- Critical: 5+ insights
- Your normal namespace count: ____
- Expected agent status: 3/3 or 4/4
4. Use Activity Feed as Audit Log
4. Use Activity Feed as Audit Log
The feed is chronological and persistent:
Use cases: - See when issues started (timestamp tracking)- Track incident resolution timeline
- Verify deployments and changes
- Post-mortem analysis
- Compliance and reporting
- Clickable for full details
- Searchable (future feature)
- Exportable for reports
5. Understand the Insight → RCA Pattern
5. Understand the Insight → RCA Pattern
Learn to recognize the two-step pattern:
Step 1: INSIGHT appears - Blue badge, INSIGHT tag- RubixKube Observer detected an issue
- Severity assigned (high/medium/low)
- Status: active
- RCA Pipeline analyzed the issue
- Root cause identified
- Status: RCA Report Analysis Complete
- Wait for the RCA before taking action
- RCA provides context and recommendations
- Pattern shows system is working correctly
6. Triage by Severity
6. Triage by Severity
Priority order for responding to insights:
** 1. HIGH severity (red text)** - OOMKilled- Critical pod failures
- Security issues
- Data loss risk
- Action: Immediate response required
- ImagePullBackOff
- Configuration errors
- Action: Investigate within hours
- Deprecations
- Minor issues
- Action: Plan fix in next sprint
- Easy visual scanning
- Click to investigate details
What You Learned
4 Key Metrics
- System Health (100% = healthy)
- Active Insights (3 issues detected)
- Intelligent Analysis (3 RCA reports)
- Agents (3/4 active)
Infrastructure Summary
- Nodes, Pods, Deployments, Services
- Expand for Namespaces and Agents
- Quick cluster overview at a glance
- Refresh button for latest data
Activity Feed
- Real-time event stream
- INSIGHT and RCA events
- Severity indicators
- Chronological history
- Clickable for details
RCA Live Stream
- Watch analysis happen live
- Shows RCA Pipeline in action
- LIVE/Offline status
- Event count when active
Daily Workflow
- 60-second morning check routine
- Scan metrics → Review feed → Verify agents
- Proactive issue detection
- Early warning system
Common Scenarios
- Health drops → Check insights
- Insights increase → Review feed
- Agents offline → Investigate
- Pattern recognition skills
Quick Reference
Dashboard at a glance:
| Panel | What You’ll See | Normal Range | Action When Abnormal |
|---|---|---|---|
| System Health | Percentage | 95-100% | Below 95% → Check Active Insights |
| Active Insights | Count + badge | 0-2 | 3+ → Review Activity Feed by severity |
| RCA Analysis | Report count | Varies | New report → Read findings in Insights |
| Agents | Fraction | 3/3 or 4/4 | Not all active → Go to Agents page |
| Infrastructure | Resource counts | Your baseline | Unexpected changes → Verify cluster |
| Activity Feed | Event stream | Varies | New HIGH severity → Investigate immediately |
| RCA Stream | Live status | LIVE 0 events | IN_PROGRESS → Watch completion |
Real Data from Our Dashboard
What we captured: - System Health:** What we captured:** (all core systems healthy)- Active Insights:** 3 ** (broken-image-demo, crash-loop-demo, memory-hog-demo)
- Intelligent Analysis:** 3 RCA reports** (all analyzed with root causes)
- Agents:** 3/4 active** (1 offline, 3 operational)
- Infrastructure:** 1 node, 14 pods, 4 deployments, 3 services ** - Expanded:** 1 node, 14 pods, 4 deployments, 3 services ** - Expanded:** 1 node, 14 pods, 4 deployments, 3 services** (4 INSIGHTS, 3 RCA reports)
- RCA Stream:LIVE with 0 active events
- System distinguishes between critical vs non-critical failures
- RCA Pipeline automatically analyzes incidents
- Real-time monitoring with historical context
Interactive Elements Summary
Clickable actions:
| Element | Action | Result |
|---|---|---|
| Refresh all data | Click button | Updates all dashboard metrics |
| Active Insights card | Click card | Navigate to Insights page |
| Agents card | Click card | Navigate to Agents page |
| Infrastructure “Refresh data” | Click button | Updates only infrastructure metrics |
| Infrastructure “Show details” | Click button | Expands to show namespaces and agents |
| Infrastructure “View page” | Click link | Navigate to full Infrastructure topology |
| Activity Feed event | Click event card | View full event details and insights |
| INSIGHT badge | Click badge | View insight details |
| RCA badge | Click badge | View RCA report with findings |
Keyboard Shortcuts
While on Dashboard:R- Refresh all dataI- Navigate to InsightsA- Navigate to AgentsEsc- Close any open modals
Troubleshooting Dashboard Issues
Dashboard not loading
Symptoms: Spinner stays visible, “Initializing dashboard…” message persists Causes: - Backend API connection issue- SSE connection failed
- Browser cache problem
Cmd+Shift+R (Mac) or Ctrl+Shift+R (Windows)
2. Check browser console for errors
3. Verify cluster connection in Settings
4. Check RubixKube backend pods are running
Activity Feed shows “No recent activity” but issues exist
Symptoms: Feed is empty despite active insights Causes: - SSE connection not established- Events not being generated
- Time range filter active
Metrics showing old data
Symptoms: Timestamp not updating, stale numbers Causes: - Auto-refresh disabled- API timeout
- Backend issue
Infrastructure shows “Disconnected”
Symptoms: Red status indicator, no metrics Causes: - Cluster connection lost- Kubeconfig invalid
- Network issue
Performance Tips
For large clusters: - Dashboard optimized for up to 1000 pods- Activity Feed shows most recent 50 events
- Older events paginated
- Infrastructure panel uses aggregated counts
- SSE uses efficient streaming
- Images lazy-loaded
- Metrics cached briefly
- Dashboard refreshes automatically
- Each cluster has independent metrics
- Activity Feed filtered by selected cluster
Next Steps
Investigate Insights
Check Agent Health
View Infrastructure
Use Chat to Investigate
You’re now a Dashboard expert! Use it daily to stay ahead of incidents.
Related Documentation
Understanding Agents
RCA Deep Dive
Memory Engine
Guardrails
Need Help?
Contact Support
Please include your Tenant ID (Settings → Organization), timestamp, and screenshots.
Troubleshooting Guide
FAQ
Docs Navigation
Feedback
Found an issue with this guide or have suggestions? We’d love to hear from you!- Email: [email protected]
- Subject: “Dashboard Guide Feedback”
Last updated: October 6, 2025 Guide version: 2.0 Based on RubixKube Console v1.0