AIOps is so powerful, vendors are building tools to clean up after agents break your infrastructure
Summary
Cohesity has teamed with ServiceNow and Datadog to build a recoverability suite designed to undo damage caused by agentic AI — the so-called AIOps agents that can autonomously diagnose and fix infrastructure. The joint offering will preserve immutable snapshots of AI environments and enable point-in-time recovery of agents, agent memory, vector stores, model configurations, training and fine-tuning data, and conventional enterprise data stores.
ServiceNow and Datadog contribute control and observability: they monitor for anomalies and can trigger API-driven restores across an estate when an issue is detected. Cohesity plans to ship the product before the end of the year. The move follows similar functionality from Rubrik and native rollback features appearing in other vendors’ agentic toolsets.
Key Points
- Cohesity, ServiceNow and Datadog are building an integrated recoverability suite to restore systems after agentic AI errors or attacks.
- The solution relies on immutable snapshots and point-in-time recovery for agents, agent memory, models, vector DBs, apps, and traditional data stores.
- Datadog and ServiceNow provide observability and control to detect anomalies and trigger restorations via APIs.
- Competing products and native rollback features already exist (eg. Rubrik, some Cisco tools), suggesting a growing market for AI rollback tooling.
- Analysts expect rapid adoption of task-specific AI agents in enterprise apps, increasing the need for recoverability and guardrails.
Context and relevance
As organisations adopt agentic AIOps to automate routine tasks and repairs, the risk of erroneous actions or exploitation grows. Recovery and resilience are becoming core to AI deployments — not optional extras. This story signals that vendors see profitable demand for tools that can rewind AI-driven changes and reconstitute both data and AI artefacts (models, vector stores, agent memory), bridging backup/DR and AI ops.
The announcement reflects two broader trends: widespread addition of agentic automation into enterprise software, and a shift in backup/DR strategies to cover AI-specific artefacts. For security and infrastructure teams, this changes recovery planning: backups must capture not only files and databases but also model states and agent context.
Author style
Punchy: vendors are racing to sell the mop while the robots still spill the soup. If you care for uptime, data integrity or AI governance, this is not niche — it’s strategic.
Why should I read this?
Because your next automated fix might also be your next outage. This short piece tells you who’s building the rewind button for AI agents, what they’ll recover (models, vector stores, agent memory and usual data), and why you should factor AI artefacts into your backup and incident plans. Saves you the time of digging through vendor blogs — quick, useful and a bit alarming.
Source
Source: https://www.theregister.com/2026/03/10/agentic_ai_rollback_recovery_cohesity/
