In today’s cloud-driven world, deploying an application is only half the job. The real challenge lies in handling failures, ensuring uptime, and building systems that can recover automatically.

To bridge the gap between theory and real-world engineering, I built a hands-on project:

Production Web App with Failure Simulation on AWS

GitHub Repository: https://github.com/anupddas/production-webapp-failure-simulation-aws.git

Tags: AWS, Cloud Computing, DevOps, Web Development, Software Engineering

Why This Project Matters

Most beginner cloud projects focus on deployment. However, in real production environments, systems fail frequently due to misconfigurations, resource exhaustion, or service crashes.

This project was designed to simulate those real-world failures and develop the ability to:

Diagnose issues quickly
Apply structured debugging approaches
Implement automated recovery mechanisms
Design highly available systems

Architecture Overview

The application follows a production-grade architecture:

User → Application Load Balancer → EC2 Instances (Nginx) → Auto Scaling Group → CloudWatch Monitoring

Key components:

Amazon EC2 instances running Nginx
Application Load Balancer for traffic distribution
Auto Scaling Group for high availability
IAM roles and Security Groups for secure access
CloudWatch for monitoring and alerting

This setup ensures scalability, fault tolerance, and observability.

Core Features Implemented

1. Web Application Deployment on AWS

Launched EC2 instances with Amazon Linux
Configured Nginx as the web server
Enabled public access via HTTP

2. Secure Infrastructure Configuration

Applied least-privilege IAM roles
Configured Security Groups to restrict access
Eliminated the need for hardcoded credentials

3. Real Failure Simulation

To replicate real production issues, I intentionally introduced failures:

SSH access failure by modifying Security Groups
Web server downtime by stopping Nginx
IAM permission errors by removing policies
High CPU utilization using load generation

Each issue was diagnosed and resolved using AWS Console tools and Linux commands.

Self-Healing Mechanisms

To reduce downtime and manual intervention, I implemented multiple recovery layers:

systemd-Based Restart

Configured Nginx to automatically restart upon failure using systemd service overrides.

Cron-Based Health Checks

Developed a custom script that periodically checks HTTP response status and restarts Nginx if needed.

This ensures that even if one recovery mechanism fails, another takes over.

High Availability with Auto Scaling and Load Balancing

To simulate production-grade infrastructure:

Configured an Application Load Balancer to distribute incoming traffic
Deployed an Auto Scaling Group across multiple Availability Zones
Enabled automatic instance replacement upon failure

This setup ensures minimal downtime and consistent user experience.

Monitoring and Observability

Using CloudWatch:

Tracked CPU utilization and system metrics
Configured alarms for high resource usage
Observed system behavior under load

This provides visibility into system health and performance.

Cost Optimization Strategy

The project was intentionally designed to stay cost-efficient:

Used t2.micro / t3.micro instances
Avoided expensive services like NAT Gateway and RDS
Stopped resources when not in use

Estimated cost remained within $8–20 per month depending on usage.

Key Learning Outcomes

This project provided hands-on experience in:

Deploying and managing AWS infrastructure
Troubleshooting real-world production issues
Implementing self-healing systems
Designing highly available architectures
Monitoring and optimizing system performance
Practicing cost-aware cloud engineering

Conclusion

Building cloud applications is not just about making things work—it’s about ensuring they continue to work under failure conditions.

This project reflects a shift from basic deployment to production-level thinking, focusing on resilience, automation, and reliability.

If you are a recruiter or hiring manager looking for candidates with practical AWS experience and problem-solving skills, this project demonstrates exactly that.