Skip to main content

Advanced Chat: Personas & Workflows

You’ve learned the basics and troubleshooting. Now let’s explore how different team members use Chat for their specific workflows, plus advanced features and power user tips.
Role-based approach: See how SREs, DevOps Engineers, Platform Engineers, and Developers each use Chat differently to maximize productivity.

Different Personas, Different Workflows

SRE: “Everything is On Fire”

Goal: Triage and resolve production incidents FAST Morning routine:
"Good morning! Any HIGH severity incidents?"
"Show me production pod health"
"What changed overnight?"
During incident:
"URGENT: What's down in production?"

"Which services are affected?"

"Root cause for checkout-service failure?"

"How do I rollback?"

"Is it fixed?"
Time: 3-5 minutes (vs. 20+ minutes) Post-mortem:
"Summarize today's incidents"
"What was the root cause?"
"How long was each service down?"
"Export this conversation for the post-mortem doc"
SRE Pro Tip: Use “URGENT” or “production” in your query - the agent understands priority and responds accordingly.

DevOps Engineer: “Deploy Safely”

Goal: Validate and deploy without breaking things Pre-deployment checklist:
1. "Status of api-gateway deployment?"
2. "Any recent issues with api-gateway?"
3. [Upload new-deployment.yaml via ]
4. "Validate this deployment"
5. "What's the blast radius if this fails?"
6. "Looks good - deploying now"
Post-deployment verification:
"How's the new api-gateway version?"
"Any errors in the logs?"
"Resource usage vs. previous version?"
During rollout:
"Are new pods coming up?"
"Any errors during rollout?"
"Should I continue or rollback?"
Chat becomes your deployment co-pilot - validates changes, monitors rollouts, suggests rollbacks if needed.

Platform Engineer: “Optimize Everything”

Goal: Resource efficiency and capacity planning Resource optimization:
"Show me pods with limits less than requests"
"What pods are over-provisioned?"
"Calculate actual vs. requested resources"
"Which deployments need autoscaling?"
"Resource waste by namespace"
Capacity planning:
"Cluster utilization percentage?"
"How many more pods can we run?"
"Memory headroom per node?"
"Project resource needs for 2x traffic"
Cost optimization:
"What pods consume the most resources?"
"Show me idle resources"
"Which nodes are underutilized?"
Platform Engineer Tip: Ask for comparisons! “Compare dev vs. prod resource usage” helps identify over-provisioning in lower environments.

Junior Developer: “Teach Me”

Goal: Learn Kubernetes while working Learning queries:
"What is a pod?"
"Explain CrashLoopBackOff in simple terms"
"Why does Kubernetes kill OOM pods?"
"What's the difference between Deployment and Pod?"
"How do resource limits work?"
Exploration:
"What applications are deployed here?"
"How does payment-service connect to the database?"
"What technologies are we using?"
"Show me an example of a healthy pod"
Safe experimentation:
"If I delete this pod, what happens?"
"What would breaking this service impact?"
"Is it safe to restart api-gateway?"
The agent becomes a patient teacher - explains concepts with examples from YOUR cluster, not generic docs.

Sample Workflows by Time of Day

Morning (9 AM): Health Check

"Good morning! Cluster status?"
→ Get: Health %, pod counts, active incidents

"Any issues overnight?"
→ Get: Events from last 8 hours

"All clear for deploys today?"
→ Get: Risk assessment
Time: 30 seconds ** ** Time: 30 seconds Confident to start the day

Afternoon (2 PM): Pre-Deployment

"Status of payment-service?"
→ Current state, recent issues

[Upload new-deployment.yaml]
"Validate this"
→ Security, resource, config checks

"What's the risk?"
→ Blast radius analysis

"Deploying now - monitor it"
→ Agent watches for issues
Time: 2 minutes ** ** Time: 2 minutes Safe deployment

Evening (8 PM): Post-Deploy Check

"How's the new payment-service?"
→ Pod status, errors, metrics

"Resource usage vs. old version?"
→ Before/after comparison

"Any regressions?"
→ Error rate, latency check
Time: 1 minute ** ** Time: 1 minute Sleep well, knowing it’s stable

3 AM: Incident Response

"URGENT: What's down?"
→ Failing pods, services affected

"Root cause?"
→ Fast RCA

"How do I fix it?"
→ Step-by-step remediation
Time: 3 minutes ** ** Time: 3 minutes Back to sleep

Advanced Features

File Upload & YAML Validation

Click****to upload Kubernetes YAML files. Use cases: - Pre-deployment validation
  • Security audits
  • Best practice checks
  • Configuration review

Example workflow:

  1. Click
  2. Select deployment.yaml
  3. Agent analyzes automatically
Sample response:
Analyzed your deployment:

 Valid YAML syntax
 Resource limits defined
  Warning: No liveness probe
  Warning: Running as root
 Error: Missing imagePullSecret

Recommendations:
1. Add livenessProbe for health checks
2. Set runAsNonRoot: true
3. Create imagePullSecret

Would you like a fixed version?

Integration with RubixKube Features

Workflow: 1. See incident spike in Dashboard 2. Click “Provide to Chat Context” 3. Chat auto-loads incident data 4. Ask follow-up questionsExample: Dashboard shows OOMKilled → Chat explains why

Power User Tips

If you see an incident in Insights:
"Analyze incident OOMKilled-20231004"
"Tell me about the crash-loop incident"
Agent pulls full RCA data immediately.
"Compare memory usage: dev vs. prod"
"Which uses more CPU: api-gateway or checkout?"
"Show me the diff between v1.2 and v1.3"
Great for before/after analysis.
Build on previous responses:
"Show me failing pods" → Get list
"Focus on the HIGH severity one" → Drill down
"Show me its logs" → Get evidence
"Explain this error" → Understand it
"How do I fix it?" → Get solution
"Summarize this RCA"
"Generate post-mortem for today"
"Export the fix we applied"
"Create runbook for this issue"
Then click Export conversation → Save as Markdown

Query Best Practices Expanded

DO This

Include Namespace

Good: “Show failing pods in prod”Why: Faster, more accurate

Specify Resource Type

Good: “Why is pod api-gateway failing?”Why: Clearer than just “api-gateway”

Ask Follow-Ups

Good: “Tell me more” or “What about logs?”Why: Leverages context

Use Urgency Keywords

Good: “URGENT” or “production down”Why: Agent prioritizes

DON’T Do This

Don't Repeat Context

Bad: Asking full question again when in conversationWhy: Agent remembers

Don't Be Too Vague

Bad: “Fix it” (without context)Why: Agent needs to know WHAT

Don't Use kubectl Syntax

Bad: “kubectl get pods -A”Why: Just ask! “Show me all pods”

Don't Assume Omniscience

Bad: “Why is it slow?” (which “it”?)Why: Be specific: “Why is checkout-service slow?”

What Chat will and will not do on its own

Chat recommends actions. It does not apply them silently.
  • Mutating commands (kubectl apply, kubectl delete, scale, rollback, restart) and anything similar on AWS, GCP, or VMs require explicit approval before they land in your environment.
  • Guardian policies define what is approvable and by whom, per environment. Production typically needs a second reviewer, staging a single approver, lab nothing at all. You set the policy.
  • Every applied action is audited: actor, time, scope, outcome. Rollbacks are one click.
See Safety and Guardrails for the full model.

Export & Share

Click Export conversation to:

Markdown Export

Save as .md for documentationUse for: Post-mortems, runbooks

JSON Export

Save as .json for analysisUse for: Audit trails, automation

Share with Team

Copy link (coming soon)Use for: Collaboration

Email Thread

Send conversation (coming soon)Use for: Stakeholder updates

Common Questions

Very high.

  • Responses based on REAL cluster data (not hallucinated)
  • Function calls to actual Kubernetes API
  • Evidence-based RCA
  • Validated against best practices
Always verify critical changes before approving, especially in production.
Chat drafts the commands and proposes the apply. Nothing lands in your environment without explicit approval, scoped by Guardian policies. On lab environments you can set policies to auto-approve; on production you typically require at least one human reviewer.
The agent will:
  1. Try multiple approaches (visible in Function Calls)
  2. Ask clarifying questions
  3. Explain what it checked
  4. Suggest alternative queries
Example: If pods not in default, asks which namespace

YES.

  • Encrypted in transit and at rest
  • Workspace-isolated
  • Exportable/deletable anytime
  • SOC 2 compliant
Retention follows your plan: 7 days on Free, 30 days on Business, unlimited on Enterprise. You can delete conversations manually at any time.

Not yet.

Coming: Custom prompts, preferred response styles, domain-specific training

Building Your Chat Habits

1

Week 1: Daily Health Checks

Start each day with: "Cluster health?"Goal: Get comfortable with Chat
2

Week 2: Troubleshooting

Use Chat for EVERY pod issueGoal: Build troubleshooting muscle memory
3

Week 3: Learning

Ask 1 “why” question per dayGoal: Deepen Kubernetes knowledge
4

Week 4: Advanced

Try file uploads, historical queries, comparisonsGoal: Become a power user

Real-World Success Patterns

Pattern 1: Morning Standup

Every day at 9 AM:
"Cluster health?"
"Any new incidents?"
"Team can deploy today?"
Result: 30-second standup prep

Pattern 2: Pre-Deploy Validation

Before EVERY deploy:
[Upload deployment.yaml]
"Validate this"
"What's the risk?"
Result: 80% fewer bad deploys

Pattern 3: Incident Response Template

When paged:
"What's down?"
"Impact?"
"Root cause?"
"Fix?"
Result: Structured triage in 3 minutes

Pattern 4: Learning Hour

Friday afternoons:
"Explain [concept] with examples from my cluster"
Result: Learn by doing with real infrastructure

Keyboard Power User Mode

Master these shortcuts:
ShortcutUse CaseTime Saved
⌘KJump to Chat from anywhere2-3 seconds
EnterSend queryInstant
Shift+EnterMulti-line queryFor complex questions
Edit/retry last queryFix typos quickly
EscClose ChatClean workspace
Pro workflow: 1. Press ⌘K (wherever you are) 2. Type “chat” 3. Type query 4. Press Enter 5. Get answer Total: 5 seconds from thought to answer

What Makes RubixKube Chat Unique

Cluster-Aware

Uses YOUR data, not generic knowledge

Memory-Powered

Recalls past incidents automatically

RCA Integration

Explains detected incidents

Multi-Agent

Coordinates Observer, RCA, Memory agents

Context Retention

True conversation thread

Transparent

See the agent think

It’s not just a chatbot - it’s your intelligent infrastructure co-pilot.


Comparing to Other Tools

FeatureGeneric ChatGPTRubixKube Chat
Knows your clusterNoYes - live data
Executes queriesNoYes - real Kubernetes API
Historical contextNoYes - Memory Engine
RCA integrationNoYes - incident correlation
Evidence-basedCan hallucinateShows actual logs/events
Transparent reasoningBlack boxShows function calls

RubixKube Chat = ChatGPT + Live Cluster Data + RCA + Memory Engine


Pro Tips for Mastery

Save your favorite queries:
Daily health:    "Cluster status + incidents?"
Pre-deploy:      "Validate [service] for deploy"
Post-deploy:     "How's [service] after deploy?"
Triage:          "Show HIGH severity issues"
Paste and run daily.
Build investigation flow:
1. "What's failing?" → Overview
2. "Focus on HIGH" → Prioritize
3. "Root cause?" → Understand
4. "Show evidence" → Verify
5. "Fix steps?" → Remediate
New team member onboarding:
1. "What applications run here?"
2. "Show me service connections"
3. "Explain our monitoring"
4. "What does payment-service do?"
Result: Self-serve onboarding
Best workflow: - Dashboard: Visual overview
  • Chat: Deep dive investigation
See spike → Click “Discuss in Chat” → Get answers
For every major incident:
  1. Investigate via Chat
  2. Click Export conversation 3. Save as Markdown
  3. Add to post-mortem
Result: Documentation writes itself

What You Learned

5 Personas

How SRE, DevOps, Platform Eng, Junior Dev use Chat differently

Time-Based Workflows

Morning, afternoon, evening, 3 AM response patterns

File Upload

YAML validation and analysis

Integrations

Chat + Dashboard + Insights + Memory Engine

Power User Shortcuts

Keyboard shortcuts for efficiency

Real Patterns

Actual workflows from production users

Next Steps

You’re now a Chat expert! Explore related concepts:

How SRI Agent Works

Deep dive into the Agent Mesh architecture

Memory Engine

How Chat accesses historical incident data

Using RubixKube Guides

Day-to-day operational guides

More Tutorials

Practice with failure scenarios

Summary

The Chat interface transforms how you work with infrastructure: Natural language replaces kubectl commands
Context awareness maintains conversation thread
Multi-persona support for different workflows
Time savings of 84% on average
Learning mode teaches Kubernetes concepts
RCA integration explains detected incidents
Transparent reasoning shows how agent thinks

You’re now equipped to use Chat like a pro across all scenarios!


Quick Reference Card

ScenarioQueryExpected Response
Daily health”Cluster health?”Health %, incidents, pod counts
Find failures”What’s failing?”List of unhealthy resources
Investigate”Why did [pod] fail?”RCA with root cause
Get logs”Show logs for [pod]“Filtered log output
Get fix”How do I fix [pod]?“kubectl commands
Verify”Is [pod] healthy?”Current status
Learn”Explain [concept]“Educational response
ValidateUpload YAML + “Validate”Security & config checks

You’ve mastered Chat! Start experimenting and discover what works best for your workflow.