AWS Outages Linked to AI Coding Tools Spark Internal Doubts at Amazon
Amazon’s cloud division AWS has suffered at least two outages in recent months linked to errors involving its own AI tools. This has raised doubts among some employees about the company’s strategy to deploy these coding assistants on a large scale, as the Financial Times reports.
This is a highly relevant incident, as Amazon Web Services is the world’s largest cloud provider. When it goes down, it typically takes numerous other web services around the world with it.
Incident in December with Kiro Tool
In December, a 13-hour interruption of an AWS system occurred after engineers allowed the AI coding tool Kiro to make certain changes. According to four people familiar with the matter, the so-called “agentic” tool, which can independently perform actions on behalf of users, concluded that the best course of action was to “delete and recreate the environment”.
The affected system enables AWS customers to analyze the costs of their services. Amazon published an internal post-mortem review of the outage.
Second Incident in a Short Time
Multiple Amazon employees confirmed to the Financial Times that this was already the second incident within a few months in which one of the company’s AI tools was at the center of a service interruption. A senior AWS employee stated: “We’ve seen at least two production outages in recent months. Engineers let the AI agent solve a problem without intervention. The outages were small, but entirely predictable.”
In the earlier incident, according to three employees, the Amazon Q Developer product was involved, an AI-powered chatbot designed to assist engineers in writing code.
Amazon’s Statement
Amazon rejected the criticism and described the involvement of the AI tools as “coincidental”. The company stated that “the same problem could occur with any developer tool or manual action”.
Amazon emphasized that in both cases it was “user error, not AI error”. There is no evidence that errors with AI tools occur more frequently than with other tools.
The company described the December incident as an “extremely limited event” that affected only a single service in parts of mainland China. The second incident did not impact any “customer-facing AWS service”.
Security Measures and Permissions
Employees reported that the company’s AI tools are treated as an extension of an operator and equipped with the same permissions. In both cases, the engineers involved did not require approval from a second person before making changes, as would normally be customary.
Amazon stated that the Kiro tool “requests authorization before every action” by default. However, the engineer involved in the December incident had “more extensive permissions than expected”. It was a “user access control problem, not an AI autonomy problem”.
Background on AWS and AI Tools
AWS is responsible for 60 percent of Amazon’s operating profits. The company develops and implements AI tools, including “agents” that can independently perform actions based on human instructions. Like many large technology companies, Amazon is attempting to sell this technology to external customers.
The Kiro tool was introduced in July. Amazon stated that the coding assistant would go beyond so-called “vibe coding” and instead write code based on a set of specifications.
Some Amazon employees expressed skepticism about the usefulness of AI tools for the majority of their work, given the risk of errors. They added that the company has set a goal whereby 80 percent of developers should use AI at least once a week for coding tasks. Adoption is being closely tracked.
Countermeasures Following the Incident
Amazon stated that following the December incident, “numerous safeguards” were implemented. These include mandatory peer reviews and employee training. The company emphasized that it is seeing strong customer growth for Kiro and that customers and employees should benefit from efficiency gains.
