Datadog integration

Datadog connects to RubixKube today through the custom integrations path. There is no separate prebuilt app to install. Most teams are up in under an hour via Custom REST, and move to Custom MCP when they want richer, semantic tools.

What to bring in

The signals that matter most when agents are investigating or writing RCAs:

Metrics (/api/v1/query, /api/v2/metrics) for service-level and infrastructure metrics.
Logs search (/api/v2/logs/events/search) for targeted log lookups during RCA.
Monitors (/api/v1/monitor) to surface monitor state and recent events as insight candidates.
APM services and spans (/api/v2/apm/*) for service-level health and latency breakdowns.

Read-only API keys are enough for all of the above.

Which path to pick

Start with REST

Datadog publishes a full OpenAPI spec. Upload it, enable the read endpoints, done.

Move to MCP when you need more

Wrap multiple calls behind named tools (for example datadog_service_health) or return streamed results.

Quick recipe: REST path

Create a Datadog API key

In Datadog, generate a read-scoped API key and an application key. Store both. No write scopes are needed for investigations.

Register the spec

In the RubixKube console, go to Integrations → Custom REST and point it at Datadog’s OpenAPI spec (or upload your own pruned copy).

Enable the endpoints you care about

Start with metrics query, logs search, monitor list, and APM services. Leave everything else disabled.

Set auth

Add the API key and application key to the workspace secret vault. Map them to the required headers.

Scope to skills

Attach the enabled endpoints to the skills that need Datadog context.

Once connected, the integration is tenant-wide.

Example tools worth having

query_datadog_metric(metric, window) — returns the recent values for a given metric.
search_datadog_logs(service, window, query) — returns recent matching log lines for context during an incident.
list_firing_monitors(service) — returns monitors currently in ALERT or WARN for a given service.
get_apm_service_health(service, window) — returns latency p50/p95/p99 and error rate for the window.

Define these as skills (or as MCP tools if you need them to compose multiple calls).

Custom REST Integrations

The full REST flow with auth options and approval rules.

Custom MCP Servers

When you want richer tool schemas and multi-call workflows.

Skills

How to compose Datadog endpoints into custom workflows.

​What to bring in

​Which path to pick