Using the Dashboard: Complete Guide
The Dashboard is your central command center - providing real-time system health, active incidents, agent status, infrastructure overview, and live activity streams all in one view.Dashboard Overview

- Title: “Dashboard” with description “System overview and real-time monitoring”
- Updated timestamp: Shows when data was last refreshed (e.g., “Updated 16:24:46”)
- Refresh all data button: Click to manually refresh all dashboard metrics
The 4 Key Metrics at a Glance

1. System Health: 100%
What It Means
- Pod health status
- Resource utilization
- Incident history
- Agent connectivity
How It's Calculated
- 100% = All pods healthy, no active incidents
- 95-99% = Minor issues detected
- 85-94% = Multiple incidents
- Below 85% = Critical issues
2. Active Insights: 3
What it shows: Number of issues currently requiring your attention Visual indicators:- Large number “3” shows total insights
- Badge “4” indicates newer/updated insights
- Orange icon for warnings
- “3” issues requiring attention
- Includes: ImagePullBackOff, CrashLoopBackOff, OOMKilled
- 🔴 HIGH severity - OOMKilled, critical failures
- 🟡 medium severity - Pod crashes, image pull errors
- 🟢 low severity - Configuration warnings
3. Intelligent Analysis: 3 RCA Reports
What it shows: Number of Root Cause Analysis reports generated by the RCA Pipeline Agent Visual indicators:- Number “3” shows completed RCA reports
- Shield icon indicates intelligent analysis
- Orange left border for attention
- 3 RCA reports completed
- Shows analysis is done, not just detection
- RubixKube didn’t just detect failures
- It ANALYZED them and found root causes
- Complete reports available with evidence
4. Agents: 3/4 Active
What it shows: How many AI agents are running / total agents From our dashboard: “3/4 Active”- 3 agents operational
- 1 agent offline or degraded
- RubixKube Observer - Monitors cluster
- RCA Pipeline - Analyzes incidents
- Memory Agent - Stores learnings
- SRI Agent - Powers Chat
Infrastructure Panel

- Refresh data button (circular arrow icon) - Update infrastructure numbers
- Show details button - Expand to see additional metrics
- View page link - Go to full Infrastructure topology
Infrastructure Metrics Overview

Metric | Our Cluster | What It Means |
---|---|---|
Nodes | 1 | Physical/virtual machines |
Pods | 14 | Running containers (including our 3 failing + healthy pods) |
Deployments | 4 | Managed application sets |
Services | 3 | Network endpoints |
- Server icon for Nodes
- Cube icon for Pods
- Grid icon for Deployments
- Globe icon for Services
Expanded Infrastructure View

Additional Metric | Value | What It Means |
---|---|---|
NAMESPACES | 7 | Kubernetes namespaces in cluster |
AGENTS | 1 | RubixKube agent pods deployed |
- Button changes to “Hide details”
- Two additional metrics appear at the bottom
- All resource counts remain visible
Activity Feed: Real-Time Event Stream

Event Types You’ll See
- INSIGHT Events
- RCA Events
- HIGH Severity
- Type: INSIGHT (blue badge)
- Severity: medium
- Status: active (white badge)
- Time: 10/4/2025 01:59 AM
🔵 “Container experiencing repeated crashes in crash-loop-demo (restart count: 3)”
- Type: INSIGHT
- Severity: medium
- Status: active
Understanding Event Details
Each event card shows: Top row:- Icon - Indicates event type (warning triangle, magnifying glass, etc.)
- Title - Clear description of what happened
- Badge - INSIGHT or RCA tag
- Severity or Status - “Severity: high/medium/low” or “RCA Report Analysis Complete”
- Timestamp - When the event occurred (MM/DD/YYYY HH:MM AM/PM)
- Status badge - “active” or “resolved”
- Blue left border = INSIGHT events
- Purple left border = RCA events
- Red “high” text = Critical severity
RCA Live Stream

- Status: LIVE (green badge) but showing 0 events
- Message: “No recent RCA events”
- Instruction: “Connect to SSE to see live events”
- Function calls being executed
- Analysis percentage (e.g., “60% complete”)
- Evidence being gathered
- Pattern matching in progress
- Real-time streaming of RCA analysis steps
- LIVE - Connected to SSE stream, ready for analysis
- Offline - Not connected to event stream
- Shows event count when active analysis is running
Dashboard Workflow: Your Daily Routine
Morning Check (60 seconds)
Open Dashboard
Check System Health
Review Active Insights
Scan Activity Feed
- 🔴 HIGH severity items (red text)
- New incidents since yesterday
- RCA reports completed (purple RCA badge)
Verify All Agents Active
Review Infrastructure
- Are pod counts normal?
- Any unexpected changes in deployments?
- Click “Show details” for namespace overview
Real Examples from Our Dashboard
Example 1: OOMKilled Detected (HIGH Severity)
What appeared in Activity Feed:⚠️ “Out of memory (OOMKilled) detected on a pod in Pod/memory-hog-demo”What to do:
- Type: INSIGHT
- Severity: high (red text)
- Status: active
- Time: 01:58 AM
- Note it’s HIGH severity (needs immediate attention)
- Look for corresponding RCA report (should appear shortly after)
- Click on the event for full details
- Goes to Insights page with complete analysis
- Review root cause and recommended fixes
- Apply memory limits or increase resources
🟣 “Postmortem: OOMKilled Event on Pod ‘memory-hog-demo‘“
Example 2: CrashLoopBackOff with RCA
What appeared:-
First - INSIGHT event:
🔵 “Container experiencing repeated crashes in crash-loop-demo (restart count: 3)”
- Type: INSIGHT
- Severity: medium
- Status: active
-
Then - RCA event:
🟣 “Postmortem: Pod ‘crash-loop-demo’ in CrashLoopBackOff State”
- Type: RCA
- Status: RCA Report Analysis Complete
- RubixKube detected the crash loop (INSIGHT)
- RCA Pipeline automatically analyzed it (RCA)
- Complete report now available with root cause
Example 3: ImagePullBackOff
What appeared:🔵 “Pod broken-image-demo has been pending for an extended period”What this means:
- Type: INSIGHT
- Severity: medium
- Status: active
- Pod stuck in Pending state
- Likely ImagePullBackOff error
- Not critical but needs fixing
- Check image name and tag
- Verify registry credentials
- Confirm image exists in registry
Understanding the Metrics
System Health Calculation
How RubixKube calculates the 100%:Pod Health Check
Incident Weight
Agent Connectivity
Resource Health
- Node status (all nodes ready)
- Critical system pods (kube-system namespace)
- RubixKube components
Final Score
- The failing pods are in tutorial namespace (non-critical)
- Core system pods are all healthy
- System infrastructure is operational
- Agents are mostly active (3/4)
Infrastructure Panel Deep Dive
What each metric means:Nodes: 1
Our cluster: Single-node KIND cluster In production: Would show multiple nodes (e.g., “5 nodes”) What to watch:- Node count changes (scale up/down)
- All nodes should be Ready status
- Node resource pressure (memory, CPU, disk)
- Node details
- Resource usage per node
- Taints and tolerations
- Node conditions
Pods: 14
Our cluster breakdown:- 8 system pods (kube-system namespace)
- 3 RubixKube pods (rubixkube-system)
- 3 failing tutorial pods (tutorial namespace)
- 🟢 Green number = All healthy
- 🟡 Yellow = Some issues (warning)
- 🔴 Red = Critical failures
- Sudden drops (pods crashing)
- Unusual increases (unexpected deployments)
- Pods stuck in Pending
Deployments: 4
Our cluster:- coredns (DNS service)
- local-path-provisioner (storage)
- kubernetes-mcp-server (API)
- rubixkube-observer (monitoring agent)
- Desired vs actual replica counts
- Rollout status
- Pod template health
- Deployment health
- Replica counts (desired/current/ready)
- Rollout history
- Pod template specs
Services: 3
Our cluster:- kubernetes (API server)
- kube-dns (DNS)
- rubixkube-observer (monitoring)
- ClusterIP - Internal cluster access
- NodePort - External access via node IP
- LoadBalancer - Cloud load balancer
- Service endpoints (should have backing pods)
- Port configurations
- Service discovery issues
Namespaces: 7 (Expanded View)
Click “Show details” to see this metric Our cluster namespaces:- default
- kube-system (system pods)
- kube-public
- kube-node-lease
- rubixkube-system (RubixKube components)
- tutorial (demo failing pods)
- local-path-storage
- Understand cluster organization
- Identify namespace-specific issues
- Track namespace growth
Agents: 1 (Expanded View)
Click “Show details” to see this metric Shows: Number of RubixKube agent pods deployed Our cluster: 1 agent pod (rubixkube-observer) Cross-reference: Compare with “Agents: 3/4” metric- 3/4 Active = 3 of 4 agent types are running
- 1 Agent pod = 1 physical pod deployed (may run multiple services)
Activity Feed Explained
From our dashboard - 7 events in chronological order (newest first):- Pod broken-image-demo pending (INSIGHT, medium, 01:59 AM)
- crash-loop-demo crashes (INSIGHT, medium, 01:58 AM)
- crash-loop-demo RCA complete (RCA, 01:58 AM)
- memory-hog-demo crashes (INSIGHT, medium, 01:58 AM)
- memory-hog-demo RCA complete (RCA, 01:58 AM)
- OOMKilled on memory-hog-demo (INSIGHT, HIGH, 01:58 AM) ⭐ Most critical
- OOMKilled RCA complete (RCA, 01:58 AM)
- INSIGHT first (detection by Observer)
- RCA follows within seconds (analysis by RCA Pipeline)
- Chronological order helps track incident progression
- Real-time updates - Events appear as they occur
- Persistent scrollback - Historical events remain
- Badge indicators - Easy visual scanning
- Click to expand - View full event details
Dashboard Actions You Can Take
1. Refresh All Data
Button: Top right “Refresh all data” (circular arrow icon) What it does:- Re-queries all metrics
- Updates infrastructure counts
- Refreshes activity feed
- Updates agent status
- Reloads RCA stream connection
- Manually verify latest state
- After making cluster changes
- When troubleshooting stale data
2. Refresh Infrastructure Only
Button: Infrastructure panel “Refresh data” button What it does:- Updates only Infrastructure panel metrics
- Faster than full dashboard refresh
- Doesn’t reload Activity Feed or metrics
- After deploying new resources
- Scaling deployments
- Adding/removing nodes
3. Expand Infrastructure Details
Button: “Show details” / “Hide details” What it reveals:- Namespaces count
- Agents count
- Keeps existing metrics visible
- Need complete infrastructure overview
- Investigating namespace issues
- Verifying agent deployment
4. Navigate to Details
From metrics cards:- Click “Active Insights” card → Go to Insights page
- Click “Agents” card → Go to Agents page
- Number indicators are clickable
- Click “View page” link → Full Infrastructure topology
- Click individual metrics → Filtered views (when on Infrastructure page)
- Click any event → Full event details and related insights
- Click INSIGHT badge → View insight details
- Click RCA badge → View RCA report
Common Dashboard Scenarios
Scenario 1: System Health Drops to 92%
What you see:- Health metric turns yellow/orange
- Active Insights count increased
- New events in Activity Feed
Identify New Insights
Check Severity
Prioritize
Investigate
Take Action
Scenario 2: Active Insights Jumps from 0 to 3
What you see:- Insights card shows “3” with badge “4”
- Multiple new events in Activity Feed
- Timestamps are recent (last few minutes)
Don't Panic
Scan Feed
Check for Patterns
Wait for RCA
Review Analysis
Scenario 3: Agent Goes Offline (shows 2/3)
What you see:- Agents metric shows “2/3” or “3/4”
- May see “Agent health degraded” in Activity Feed
- System Health may decrease slightly
Navigate to Agents
Identify Offline Agent
Check Agent Pod
Review Logs
Restart Agent
kubectl rollout restart deployment/[agent] -n rubixkube-system
Verify Recovery
Scenario 4: RCA Live Stream Shows Activity
What you see:- RCA Live Stream panel shows “LIVE” with event count (e.g., “3 events”)
- Events stream in real-time
- May see function calls, analysis steps
Watch Analysis
Wait for Completion
Check Activity Feed
Review Findings
Scenario 5: Infrastructure Metrics Change Unexpectedly
What you see:- Pod count increased from 14 to 18
- No corresponding Activity Feed events
- System Health still 100%
Click View Page
Check New Pods
Identify Namespace
Review Deployments
Verify Expected
Best Practices for Dashboard Usage
1. Start Every Day Here
1. Start Every Day Here
Open Dashboard
60-Second Scan
Note Changes
Drill Down
2. Keep Dashboard Open
2. Keep Dashboard Open
- Real-time updates via Server-Sent Events (SSE)
- Activity Feed refreshes automatically
- Spot new incidents immediately as they occur
- No manual refresh needed
- Open Dashboard in browser
- Right-click tab → “Pin Tab”
- Position as first tab
- Check throughout the day
3. Learn Your Baseline
3. Learn Your Baseline
- Normal range: 95-100%
- Typical: 100% for healthy clusters
- Alert threshold: Below 95%
- Normal range: 0-2 for healthy clusters
- Alert threshold: 3+ or sudden increase
- Critical: 5+ insights
- Your normal pod count: ____
- Your normal namespace count: ____
- Expected agent status: 3/3 or 4/4
4. Use Activity Feed as Audit Log
4. Use Activity Feed as Audit Log
- See when issues started (timestamp tracking)
- Track incident resolution timeline
- Verify deployments and changes
- Post-mortem analysis
- Compliance and reporting
- Events remain in feed (not just transient)
- Clickable for full details
- Searchable (future feature)
- Exportable for reports
5. Understand the Insight → RCA Pattern
5. Understand the Insight → RCA Pattern
- Blue badge, INSIGHT tag
- RubixKube Observer detected an issue
- Severity assigned (high/medium/low)
- Status: active
- Purple badge, RCA tag
- RCA Pipeline analyzed the issue
- Root cause identified
- Status: RCA Report Analysis Complete
- Don’t panic when you see an INSIGHT
- Wait for the RCA before taking action
- RCA provides context and recommendations
- Pattern shows system is working correctly
6. Triage by Severity
6. Triage by Severity
- OOMKilled
- Critical pod failures
- Security issues
- Data loss risk
- Action: Immediate response required
- CrashLoopBackOff
- ImagePullBackOff
- Configuration errors
- Action: Investigate within hours
- Warnings
- Deprecations
- Minor issues
- Action: Plan fix in next sprint
- HIGH severity shows in red
- Easy visual scanning
- Click to investigate details
What You Learned
4 Key Metrics
- System Health (100% = healthy)
- Active Insights (3 issues detected)
- Intelligent Analysis (3 RCA reports)
- Agents (3/4 active)
Infrastructure Summary
- Nodes, Pods, Deployments, Services
- Expand for Namespaces and Agents
- Quick cluster overview at a glance
- Refresh button for latest data
Activity Feed
- Real-time event stream
- INSIGHT and RCA events
- Severity indicators
- Chronological history
- Clickable for details
RCA Live Stream
- Watch analysis happen live
- Shows RCA Pipeline in action
- LIVE/Offline status
- Event count when active
Daily Workflow
- 60-second morning check routine
- Scan metrics → Review feed → Verify agents
- Proactive issue detection
- Early warning system
Common Scenarios
- Health drops → Check insights
- Insights increase → Review feed
- Agents offline → Investigate
- Pattern recognition skills
Quick Reference
Dashboard at a glance:Panel | What You’ll See | Normal Range | Action When Abnormal |
---|---|---|---|
System Health | Percentage | 95-100% | Below 95% → Check Active Insights |
Active Insights | Count + badge | 0-2 | 3+ → Review Activity Feed by severity |
RCA Analysis | Report count | Varies | New report → Read findings in Insights |
Agents | Fraction | 3/3 or 4/4 | Not all active → Go to Agents page |
Infrastructure | Resource counts | Your baseline | Unexpected changes → Verify cluster |
Activity Feed | Event stream | Varies | New HIGH severity → Investigate immediately |
RCA Stream | Live status | LIVE 0 events | IN_PROGRESS → Watch completion |
Real Data from Our Dashboard
What we captured:- ✅ System Health: 100% (all core systems healthy)
- ✅ Active Insights: 3 (broken-image-demo, crash-loop-demo, memory-hog-demo)
- ✅ Intelligent Analysis: 3 RCA reports (all analyzed with root causes)
- ✅ Agents: 3/4 active (1 offline, 3 operational)
- ✅ Infrastructure: 1 node, 14 pods, 4 deployments, 3 services
- ✅ Expanded: 7 namespaces, 1 agent pod
- ✅ Activity Feed: 7 events (4 INSIGHTS, 3 RCA reports)
- ✅ RCA Stream: LIVE with 0 active events
- Dashboard provides complete visibility even when things are failing
- System distinguishes between critical vs non-critical failures
- RCA Pipeline automatically analyzes incidents
- Real-time monitoring with historical context
Interactive Elements Summary
Clickable actions:Element | Action | Result |
---|---|---|
Refresh all data | Click button | Updates all dashboard metrics |
Active Insights card | Click card | Navigate to Insights page |
Agents card | Click card | Navigate to Agents page |
Infrastructure “Refresh data” | Click button | Updates only infrastructure metrics |
Infrastructure “Show details” | Click button | Expands to show namespaces and agents |
Infrastructure “View page” | Click link | Navigate to full Infrastructure topology |
Activity Feed event | Click event card | View full event details and insights |
INSIGHT badge | Click badge | View insight details |
RCA badge | Click badge | View RCA report with findings |
Keyboard Shortcuts
While on Dashboard:R
- Refresh all dataI
- Navigate to InsightsA
- Navigate to AgentsEsc
- Close any open modals
Troubleshooting Dashboard Issues
Dashboard not loading
Symptoms: Spinner stays visible, “Initializing dashboard…” message persists Causes:- Backend API connection issue
- SSE connection failed
- Browser cache problem
- Hard refresh:
Cmd+Shift+R
(Mac) orCtrl+Shift+R
(Windows) - Check browser console for errors
- Verify cluster connection in Settings
- Check RubixKube backend pods are running
Activity Feed shows “No recent activity” but issues exist
Symptoms: Feed is empty despite active insights Causes:- SSE connection not established
- Events not being generated
- Time range filter active
- Click “Refresh all data”
- Check browser console for SSE errors
- Verify RubixKube Observer is running
- Check “Connect to SSE to see live events” message
Metrics showing old data
Symptoms: Timestamp not updating, stale numbers Causes:- Auto-refresh disabled
- API timeout
- Backend issue
- Click “Refresh all data” manually
- Check network tab for failed requests
- Verify backend API health
- Restart browser if needed
Infrastructure shows “Disconnected”
Symptoms: Red status indicator, no metrics Causes:- Cluster connection lost
- Kubeconfig invalid
- Network issue
- Go to Settings → Clusters
- Verify cluster connection status
- Test connection
- Re-add cluster if needed
- Check kubeconfig validity
Performance Tips
For large clusters:- Dashboard optimized for up to 1000 pods
- Activity Feed shows most recent 50 events
- Older events paginated
- Infrastructure panel uses aggregated counts
- Dashboard data compressed
- SSE uses efficient streaming
- Images lazy-loaded
- Metrics cached briefly
- Switch cluster in Settings
- Dashboard refreshes automatically
- Each cluster has independent metrics
- Activity Feed filtered by selected cluster
Next Steps
Investigate Insights
Check Agent Health
View Infrastructure
Use Chat to Investigate
You’re now a Dashboard expert! Use it daily to stay ahead of incidents. 📊
Related Documentation
Understanding Agents
RCA Deep Dive
Memory Engine
Guardrails
Need Help?
Support
Documentation
Tutorials
Community
Feedback
Found an issue with this guide or have suggestions? We’d love to hear from you!- Email: connect@rubixkube.ai
- Subject: “Dashboard Guide Feedback”
Last updated: October 6, 2025 Guide version: 2.0 Based on RubixKube Console v1.0