Amazon Tightens Code Guardrails After Outages Rock Retail Business Amazon is implementing stricter internal controls following a series of outages that disrupted its e-commerce operations, including incidents linked to its AI coding assistant Q. The company’s senior vice president of e-commerce services, Dave Treadwell, revealed in internal documents that a "trend of incidents" emerged since the third quarter of 2025, with several major disruptions in recent weeks. At least one of these issues was tied to Q, while others exposed systemic vulnerabilities in the company’s software development processes. The outages were attributed to problems such as "high blast radius changes," where software updates spread widely due to insufficient safeguards in control planes—systems that manage data flow across networks. In some cases, data corruption took hours to resolve, and basic checks, like requiring two people to authorize code changes, were either missing or bypassed. Treadwell emphasized that these failures highlighted the need for more rigorous oversight, particularly in areas directly impacting customer experiences. In response, Amazon is introducing temporary safety measures designed to introduce "controlled friction" into the code-change review process. Engineers will now need to document changes more thoroughly and secure additional approvals. The company is also investing in both AI-driven "agentic" tools and deterministic, rules-based systems to balance flexibility with reliability. Treadwell noted that AI models, while powerful, are inherently non-deterministic, meaning they can produce slightly different outputs for the same input. This unpredictability poses risks for critical systems like those managing product data, pricing, and transactions on Amazon’s platform.#amazon #dave_treadwell #q #amazon_marketplaces #tier_1_systems

Elon Musk offers warning following reports of Amazon meeting to address AI-related outages Amazon held a mandatory meeting to investigate recent outages linked to AI-assisted coding features, according to reports from Financial Times. The e-commerce giant reportedly convened the session to address a “trend of incidents” involving AI-driven changes, which had caused widespread disruptions. One notable outage saw over 22,000 users unable to access Amazon’s website or app, with issues including checkout failures and account access problems. The company attributed the incident to a “software code deployment,” though the meeting focused on broader concerns about the impact of AI on system stability. Elon Musk responded to the reports with a cautionary message, tweeting, “Proceed with caution,” in reference to Amazon’s handling of AI-related risks. The warning came after internal emails revealed that Amazon’s senior vice president of e-commerce services, Dave Treadwell, proposed using the company’s weekly “This Week in Stores Tech” (TWiST) meeting to implement stricter safeguards for AI usage. These measures would require senior engineers to review changes made by junior and mid-level engineers, aiming to prevent similar outages. Amazon’s spokesperson clarified that the TWiST meeting is a routine operational review, emphasizing that the company’s focus remains on improving system reliability. The spokesperson also noted that Amazon Web Services (AWS) was not involved in the incidents and that only one of the discussed issues was tied to AI, with no code written by AI directly causing the outage. The company further stated that junior and mid-level engineers are not required to have senior approval for AI-assisted changes.#amazon #elon_musk #amazon_web_services #dave_treadwell #this_week_in_stores_tech
