Technical Leadership Aws Infrastructure Ai Development

When AI coding assistants almost destroyed production: the AWS CDK disaster I prevented

An AI coding assistant suggested "fixing" a deployment error by modifying a production stack, which would have wiped all production data on the next destroy. Here's why trusting AI with infrastructure code without understanding the implications is extremely dangerous.

Antonio Angelino

Tech Lead Coach

Book a Call

Last week, I was reviewing a client’s AWS CDK infrastructure setup when I discovered something that made my blood run cold. Their development team had been working with AI coding assistants to solve a deployment issue, and the AI had suggested a “quick fix” that would have resulted in the complete destruction of their production database and all associated data.

The terrifying part? The developer was ready to implement the suggestion without understanding the catastrophic implications.

This wasn’t a case of malicious AI or a particularly complex scenario. It was a textbook example of how AI coding assistants optimize for making code work without understanding the broader consequences of infrastructure decisions. The AI saw an error message and suggested a logical-seeming solution that would have turned a minor deployment issue into a business-ending disaster.

Here’s what happened, why it was so dangerous, and what every technical leader needs to know about AI-assisted infrastructure development.

The scenario: a “simple” CDK multi-stack deployment

The team was building a data migration system using AWS CDK, split across multiple stacks for logical separation:

Production Stack: The main application infrastructure with RDS database, Lambda functions, and core business logic
Migration Stack: A temporary stack designed to import data from their legacy system
Cleanup Stack: Another temporary stack for post-migration cleanup tasks

This is a reasonable architectural approach. You want your core production infrastructure separate from temporary migration components so you can destroy the temporary pieces after the migration completes.

The migration stack needed to access certain resources created by the production stack - specifically, it needed IAM permissions to read from the production database and write to specific S3 buckets.

In CDK, there are several ways to handle cross-stack resource access. The team initially tried to use CDK’s cross-stack references, but ran into a common issue: you can’t directly attach policies to resources that are imported from other stacks.

This is where the AI assistant entered the picture.

The AI’s “logical” but catastrophic suggestion

When the deployment failed with an error about being unable to attach IAM policies to imported resources, the developer fed the error message to their AI coding assistant. The AI analyzed the error and suggested what seemed like a reasonable solution:

“Instead of trying to attach policies to imported resources, extend the production stack to include the IAM roles and policies needed by the migration stack. This will give you direct access to the resources without the cross-stack import limitations.”

The AI even provided clean, well-structured CDK code that would implement this approach. The code looked professional, followed CDK best practices, and would have solved the immediate deployment issue.

There was just one problem: implementing this suggestion would have created a dependency relationship that would destroy the entire production stack when the temporary migration stack was cleaned up.

The hidden danger: CDK dependency chains and cascade destruction

Here’s the critical concept that the AI assistant completely missed: In AWS CDK, when you create dependencies between stacks, you create potential cascade destruction scenarios.

The AI’s suggested approach would have created this dependency chain:

Migration Stack → Production Stack (extended with migration IAM)

This means the production stack would become dependent on resources defined in the migration stack. In CDK’s dependency model, this creates a deletion order requirement: you cannot delete the production stack while the migration stack still references its resources.

But here’s where it gets catastrophic: CDK resolves dependencies in both directions. If the migration stack is deleted, CDK evaluates whether any dependent resources also need to be cleaned up. With the AI’s suggested architecture, deleting the migration stack would trigger a cascade deletion of the production stack.

The practical impact: When the migration was complete and the team ran cdk destroy MigrationStack, CDK would have:

Deleted the migration stack
Evaluated dependency chains
Determined that the production stack components added for migration support were no longer needed
Initiated deletion of the production stack
Wiped out the production database, all application data, and the entire business-critical infrastructure

The AI was optimizing for making the deployment work, not for understanding the catastrophic business implications of the architectural decision.

Facing a leadership challenge right now?

Don't wait for the next fire to burn you out. In a 30-minute discovery call we'll map your blockers and outline next steps you can use immediately with your team.

Book a Free Discovery Call

Why AI assistants are particularly dangerous for infrastructure

This scenario highlights fundamental problems with using AI assistants for infrastructure decisions:

AI optimizes for code functionality, not business consequences

AI coding assistants are trained to solve immediate technical problems. When they see a deployment error, they suggest solutions that make the code work. They don’t understand:

Business criticality of different infrastructure components
Long-term maintenance implications of architectural decisions
Cascade effects in infrastructure dependency chains
The difference between “working code” and “safe infrastructure”

Infrastructure code isn’t just code

This is the key insight that AI assistants consistently miss: infrastructure code directly controls business-critical systems and data. Unlike application code, where bugs might cause user experience issues, infrastructure mistakes can destroy businesses.

Every infrastructure decision has implications for:

Data persistence and backup strategies
Security boundaries and access controls
Disaster recovery and business continuity
Cost optimization and resource management
Compliance and audit requirements

AI assistants treat infrastructure code like any other programming problem, without understanding these broader implications.

Context matters more than syntax

The AI assistant understood CDK syntax perfectly and generated syntactically correct code. But it completely missed the operational context:

This was production infrastructure with live business data
The migration was temporary and would be destroyed after completion
The dependency relationship would create unexpected deletion behaviors
Alternative approaches existed that wouldn’t create these risks

Infrastructure decisions require understanding the full operational lifecycle, not just the immediate technical implementation.

The correct solution: isolated IAM resources

The safe solution to the original problem was straightforward once you understand the constraints:

Instead of extending the production stack or trying to attach policies to imported resources, create independent IAM resources in the migration stack that reference (but don’t depend on) production resources.

// Safe approach: Independent IAM in migration stack
const migrationRole = new iam.Role(this, 'MigrationRole', {
  assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'),
  inlinePolicies: {
    DatabaseAccess: new iam.PolicyDocument({
      statements: [
        new iam.PolicyStatement({
          effect: iam.Effect.ALLOW,
          actions: ['rds:DescribeDBInstances', 'rds:Connect'],
          resources: [`arn:aws:rds:${region}:${account}:db:production-db`]
        })
      ]
    })
  }
});

This approach:

Creates no dependency between stacks
Allows safe deletion of the migration stack
Provides the necessary permissions
Maintains clear architectural boundaries

The AI assistant could have suggested this approach, but it was fixated on solving the immediate error rather than understanding the architectural requirements.

Coaching for Tech Leads & CTOs

Ongoing 1:1 coaching for startup leaders who want accountability, proven frameworks, and a partner to help them succeed under pressure.

Explore Coaching

Real-world scenarios where AI infrastructure suggestions go wrong

This CDK scenario isn’t unique. I’ve seen AI assistants suggest dangerous infrastructure changes across multiple domains:

Security group modifications

The Scenario: Application can’t connect to database. AI suggests opening security group rules.

The AI Suggestion: “Add a security group rule allowing all traffic from 0.0.0.0/0 to port 5432 to resolve connectivity issues.”

The Reality: This opens the production database to the entire internet. The correct solution was fixing the application’s subnet configuration.

Auto-scaling configuration

The Scenario: Application experiencing high load during traffic spikes.

The AI Suggestion: “Increase maximum instance count to 1000 and set aggressive scaling policies to handle any load.”

The Reality: No cost controls or resource limits. A traffic spike or DDoS attack could generate $50K+ in AWS bills in hours. The correct solution involved traffic analysis and gradual scaling with cost monitoring.

Database migration scripts

The Scenario: Need to update database schema in production.

The AI Suggestion: “Run this ALTER TABLE command directly on production to add the new column.”

The Reality: The suggestion would lock the table for hours during the migration, causing complete application downtime. The correct approach required careful planning with read replicas and phased deployment.

Backup and recovery modifications

The Scenario: Need to modify backup retention policies to reduce costs.

The AI Suggestion: “Set backup retention to 1 day and disable cross-region replication to minimize storage costs.”

The Reality: This would have eliminated the ability to recover from any incident older than 24 hours and removed geographic disaster recovery capability. The correct approach involved analyzing actual recovery requirements and optimizing backup strategies accordingly.

The pattern: AI tools lack business context

In every case, the AI assistant provided technically correct solutions that would have caused business disasters. The pattern is consistent:

Focus on immediate technical problem: AI sees error or inefficiency and optimizes for fixing it
Ignore operational implications: No understanding of business impact, risk management, or long-term consequences
Miss critical constraints: Don’t consider security, compliance, cost, or reliability requirements
Provide dangerous simplifications: Suggest approaches that work in isolated scenarios but fail in production environments

This is why infrastructure decisions require human expertise, business context, and operational understanding that current AI tools simply don’t possess.

What technical leaders need to know about AI infrastructure risks

Establish AI usage policies for infrastructure

Not all code is equally risky. Infrastructure code that controls production systems, data persistence, security boundaries, and network access requires different handling than application logic.

Create distinct policies:

Application Code: AI assistance with human review
Infrastructure Code: AI research only, human-driven implementation
Security Configuration: No AI assistance for production systems
Database Operations: Senior engineer oversight required

Implement infrastructure-specific review processes

Standard code review processes aren’t sufficient for infrastructure code. You need reviews that specifically evaluate:

Dependency relationships and deletion ordering
Security implications and access boundaries
Cost impact and resource scaling behavior
Disaster recovery and backup implications
Compliance and audit trail requirements

Train teams on infrastructure risk assessment

Developers using AI tools for infrastructure need training on:

How cloud dependencies work and cascade deletion scenarios
Security implications of different architectural choices
Cost optimization and resource limit strategies
Operational impact of infrastructure changes
When to escalate infrastructure decisions to senior engineers

Use AI for research, not implementation

AI tools are excellent for researching infrastructure patterns, understanding error messages, and exploring different approaches. But the actual implementation decisions should always involve human expertise that understands the business and operational context.

Got a leadership question?

Share your toughest challenge and I might feature it in an upcoming episode. It's free, anonymous, and you'll get extra resources in return.

Submit your question

Case study: building safe AI-assisted infrastructure practices

After the near-disaster I described, I worked with this client to establish better practices for AI-assisted infrastructure development:

Step 1: Infrastructure classification system

We classified all infrastructure code into risk categories:

High Risk: Production data, security controls, network boundaries
Medium Risk: Application infrastructure, non-production environments
Low Risk: Development tools, temporary resources, documentation

AI assistance policies were tailored to each risk level.

Step 2: Review gate implementation

All infrastructure changes required two types of review:

Technical Review: Does the code work correctly and follow best practices?
Operational Review: Does the change align with business requirements, security policies, and operational constraints?

For changes involving AI assistance, we added a third review: AI Impact Assessment to evaluate whether the AI understood the full context of the suggestion.

Step 3: Training program

We implemented training that covered:

Infrastructure fundamentals: How cloud dependencies, networking, and security actually work
Risk assessment: How to evaluate the business impact of infrastructure changes
AI tool limitations: When AI suggestions are helpful vs. dangerous
Escalation protocols: When to involve senior engineers or architects

Step 4: Monitoring and learning

We tracked AI-assisted infrastructure decisions and their outcomes:

What AI suggestions were implemented and their long-term impact
What AI suggestions were rejected and why
What problems emerged from AI-assisted infrastructure changes
How to improve our review processes and training

The results

After six months with these practices:

Zero infrastructure incidents related to AI-assisted changes
Faster development for appropriate use cases where AI assistance was safe
Better team skills in evaluating infrastructure decisions
Improved risk management across all infrastructure changes

The key insight: AI tools became more valuable when we established clear boundaries around their use.

The broader implications: AI in enterprise infrastructure

This CDK incident represents a broader challenge as AI tools become more prevalent in enterprise infrastructure management:

The skills gap is widening

As AI tools make it easier to write infrastructure code, the gap between writing code and understanding infrastructure is growing. Developers can generate complex infrastructure configurations without understanding how they actually work.

Traditional review processes are insufficient

Code review processes designed for application code don’t catch infrastructure-specific risks. Organizations need new review frameworks that understand the operational implications of infrastructure changes.

Documentation and training become critical

As AI abstracts away infrastructure complexity, teams need better training on fundamentals. Understanding how cloud services actually work becomes more important, not less.

Risk management needs updating

Traditional risk management frameworks don’t account for AI-generated infrastructure decisions. Organizations need new approaches that consider the unique risks of AI-assisted infrastructure development.

Building infrastructure teams that use AI safely

For startups and growing companies

Start with human expertise: Before scaling AI-assisted infrastructure development, ensure you have senior engineers who understand cloud architecture, security, and operational implications.

Establish safety boundaries: Create clear policies about when AI tools are appropriate for infrastructure decisions and when human expertise is required.

Implement proper review processes: Design review workflows that catch infrastructure-specific risks that AI tools commonly miss.

Invest in training: Ensure team members understand the fundamentals that AI tools abstract away.

For established organizations

Audit existing AI usage: Review infrastructure changes that involved AI assistance for potential hidden risks or dependency issues.

Update governance frameworks: Modify existing infrastructure governance to account for AI-assisted development patterns.

Develop AI-specific risk assessments: Create frameworks for evaluating the risks of AI-generated infrastructure suggestions.

Create learning programs: Train teams on safe AI usage for infrastructure while maintaining fundamental understanding.

The technical leader’s role in AI infrastructure safety

As technical leaders, we have a responsibility to ensure AI tools enhance rather than endanger our infrastructure:

Setting boundaries and expectations

Technical leaders need to establish clear guidelines about when and how AI tools should be used for infrastructure decisions. This isn’t about restricting useful tools - it’s about ensuring they’re used safely.

Building review processes that catch AI-specific risks

Traditional infrastructure review processes need updating to catch the specific types of problems that AI tools commonly create: over-optimization for immediate problems while missing broader implications.

Developing team capabilities

Teams need training not just on how to use AI tools, but on how to evaluate AI suggestions critically and understand when human expertise is required.

Balancing innovation with safety

The goal isn’t to eliminate AI assistance, but to harness its benefits while maintaining the operational safety and business continuity that infrastructure decisions require.

Coaching for Tech Leads & CTOs

Ongoing 1:1 coaching for startup leaders who want accountability, proven frameworks, and a partner to help them succeed under pressure.

Explore Coaching

Conclusion: infrastructure requires human judgment

AI coding assistants are powerful tools that can accelerate infrastructure development when used appropriately. But infrastructure decisions have consequences that extend far beyond whether the code compiles and deploys successfully.

The CDK scenario I described illustrates a fundamental limitation of current AI tools: they optimize for immediate technical problems without understanding broader business, operational, or risk management context.

Infrastructure code isn’t just code - it’s your business continuity plan, your security boundary, and your disaster recovery strategy all rolled into one. These decisions require human judgment, business context, and operational expertise that current AI tools simply don’t possess.

Key takeaways for technical leaders

AI tools are research assistants, not infrastructure architects: Use AI to explore options and understand error messages, but make implementation decisions with full human understanding of the implications.

Infrastructure reviews need AI-specific considerations: Traditional code review processes don’t catch the types of risks that AI-generated infrastructure commonly introduces.

Team training becomes more critical, not less: As AI abstracts away complexity, teams need deeper understanding of fundamentals to use AI safely.

Risk management frameworks need updating: Organizations need new approaches to evaluate and manage the risks of AI-assisted infrastructure development.

The path forward

The companies that thrive with AI-assisted infrastructure development will be those that:

Establish clear boundaries around AI usage for different types of infrastructure decisions
Implement review processes that understand operational and business implications
Invest in team training that builds fundamental understanding alongside AI tool usage
Create risk management frameworks that account for AI-specific challenges

The companies that struggle will be those that treat infrastructure code like any other code and trust AI suggestions without understanding their broader implications.

The investment in safety

Implementing proper AI safety practices for infrastructure requires investment in training, review processes, and senior technical expertise. But this investment is minimal compared to the cost of infrastructure disasters.

I’ve seen companies spend months and hundreds of thousands of dollars recovering from infrastructure mistakes - lost data, security breaches, compliance violations, and extended downtime. The cost of proper oversight and review processes is always less than the cost of catastrophic infrastructure failures.

Trusting AI with infrastructure decisions you don’t understand is like giving someone the keys to your data center without knowing their qualifications. The potential consequences are simply too severe to rely on tools that don’t understand the business context of their suggestions.

The future of infrastructure development will be AI-assisted, but it must be human-guided. The teams that master this balance will build faster while maintaining the safety and reliability that businesses depend on.

Facing a leadership challenge right now?

Don't wait for the next fire to burn you out. In a 30-minute discovery call we'll map your blockers and outline next steps you can use immediately with your team.

Book a Free Discovery Call

I’ve helped numerous organizations develop safe practices for AI-assisted infrastructure development, from establishing governance frameworks to training teams on effective AI usage while maintaining operational safety. If you’re looking to harness AI tools for infrastructure development while avoiding the common pitfalls and hidden risks, I’d be happy to discuss how fractional CTO support can help your specific situation.

Share this article: 📤

📈 Join 2,000+ Tech Leaders

Get my weekly leadership insights delivered every Tuesday. Team scaling tactics, hiring frameworks, and real wins from the trenches.

✓ No spam ✓ Unsubscribe anytime ✓ Trusted by 50+ startup CTOs

Managing technical risk during rapid growth: what keeps CTOs awake

Rapid business growth creates technical risks that can destroy companies overnight. After helping dozens of fast-growing startups navigate hypergrowth, here's how to identify, prioritize, and manage technical risk without killing momentum.

Nov 17, 2025

Technical due diligence red flags every investor should know

Avoid costly technical surprises in your startup investments. Learn the critical red flags and warning signs that experienced investors look for during technical due diligence, with practical frameworks for evaluating engineering teams, architecture decisions, and technical debt risks.

Oct 24, 2025

Why your best engineer makes a terrible technical lead

Promoting your strongest individual contributor to technical lead often backfires spectacularly. Here's why technical excellence doesn't predict leadership success, and how to identify engineers who can actually lead teams effectively.

Oct 14, 2025

Back to all posts

When AI coding assistants almost destroyed production: the AWS CDK disaster I prevented

The scenario: a “simple” CDK multi-stack deployment

The AI’s “logical” but catastrophic suggestion

The hidden danger: CDK dependency chains and cascade destruction

Facing a leadership challenge right now?

Why AI assistants are particularly dangerous for infrastructure

AI optimizes for code functionality, not business consequences

Infrastructure code isn’t just code

Context matters more than syntax

The correct solution: isolated IAM resources

Coaching for Tech Leads & CTOs

Real-world scenarios where AI infrastructure suggestions go wrong

Security group modifications

Auto-scaling configuration

Database migration scripts

Backup and recovery modifications

The pattern: AI tools lack business context

What technical leaders need to know about AI infrastructure risks

Establish AI usage policies for infrastructure

Implement infrastructure-specific review processes

Train teams on infrastructure risk assessment

Use AI for research, not implementation

Got a leadership question?

Case study: building safe AI-assisted infrastructure practices

Step 1: Infrastructure classification system

Step 2: Review gate implementation

Step 3: Training program

Step 4: Monitoring and learning

The results

The broader implications: AI in enterprise infrastructure

The skills gap is widening

Traditional review processes are insufficient

Documentation and training become critical

Risk management needs updating

Building infrastructure teams that use AI safely

For startups and growing companies

For established organizations

The technical leader’s role in AI infrastructure safety

Setting boundaries and expectations

Building review processes that catch AI-specific risks

Developing team capabilities

Balancing innovation with safety

Coaching for Tech Leads & CTOs

Conclusion: infrastructure requires human judgment

Key takeaways for technical leaders

The path forward

The investment in safety

Facing a leadership challenge right now?

📈 Join 2,000+ Tech Leaders

Related Articles

Managing technical risk during rapid growth: what keeps CTOs awake

Technical due diligence red flags every investor should know

Why your best engineer makes a terrible technical lead