Amazon Tightens Code Guardrails After Outages Rock Retail Business Amazon is implementing stricter internal controls following a series of outages that disrupted its e-commerce operations, including incidents linked to its AI coding assistant Q. The company’s senior vice president of e-commerce services, Dave Treadwell, revealed in internal documents that a "trend of incidents" emerged since the third quarter of 2025, with several major disruptions in recent weeks. At least one of these issues was tied to Q, while others exposed systemic vulnerabilities in the company’s software development processes. The outages were attributed to problems such as "high blast radius changes," where software updates spread widely due to insufficient safeguards in control planes—systems that manage data flow across networks. In some cases, data corruption took hours to resolve, and basic checks, like requiring two people to authorize code changes, were either missing or bypassed. Treadwell emphasized that these failures highlighted the need for more rigorous oversight, particularly in areas directly impacting customer experiences. In response, Amazon is introducing temporary safety measures designed to introduce "controlled friction" into the code-change review process. Engineers will now need to document changes more thoroughly and secure additional approvals. The company is also investing in both AI-driven "agentic" tools and deterministic, rules-based systems to balance flexibility with reliability. Treadwell noted that AI models, while powerful, are inherently non-deterministic, meaning they can produce slightly different outputs for the same input. This unpredictability poses risks for critical systems like those managing product data, pricing, and transactions on Amazon’s platform.#amazon #dave_treadwell #q #amazon_marketplaces #tier_1_systems
