top of page

Storage bottleneck, website down

  • mooneya9
  • Feb 25, 2024
  • 2 min read

Updated: Jun 12

A client approached us with a recurring issue: their website would become slow or entirely unresponsive during marketing campaigns. These campaigns were designed to attract large volumes of traffic, but the resulting load consistently overwhelmed the site. Although the client believed their system was designed to auto-scale, the symptoms suggested that only part of the stack was actually scalable.


The website in question was a brochure-style site hosting static content such as images and videos. It relied on a set of NGINX web servers and used GlusterFS, running on a single EC2 instance, as the storage layer. GlusterFS functioned like a network file system (similar to NFS) but was hosted on a fixed instance with no inherent ability to scale.


Initial analysis confirmed that while the NGINX layer was configured to auto-scale, the underlying GlusterFS-based storage layer was not. All content served by the web servers was read from a single EC2 instance, making the storage system a bottleneck during high-traffic events.


As an immediate short-term solution, we resized the GlusterFS EC2 instance to a more powerful instance type and upgraded the associated storage volumes from GP2 to GP3. This increased both compute and IOPS capacity, which allowed the site to remain stable during campaign periods. However, this approach was not cost-efficient, as the larger instance was running continuously - even during periods of low traffic.


To address the root of the problem, we proposed a migration to AWS Elastic File System (EFS), a fully managed, highly scalable network file system. EFS dynamically scales with demand, making it a natural fit for workloads with variable traffic patterns. With EFS in place, the storage layer could elastically expand during peak load - such as during a successful marketing campaign - and contract automatically when traffic subsided. This reduced unnecessary spend and eliminated the performance bottleneck associated with a single EC2-hosted file system.


An additional benefit of the migration was increased resilience. The GlusterFS setup operated in a single Availability Zone and had no failover capabilities. By contrast, EFS was deployed in a multi-AZ configuration, ensuring that the site would remain available even in the event of a datacentre outage within the AWS region.


The migration was completed with zero downtime. After preparing the EFS environment and syncing data from GlusterFS, we requested a one-hour change freeze. During this window, a final sync was performed, and the NGINX servers were reconfigured to mount EFS. The transition was seamless, and site availability was maintained throughout.


Load testing confirmed the effectiveness of the new architecture. Even under extreme simulated traffic, page load times remained consistently under 0.5 seconds. In addition, alarms were implemented on key scalable components to detect and alert on unusual traffic patterns, helping protect against the risk of a cost-amplifying DDoS event.


By migrating to EFS, the client gained a storage solution that was scalable, resilient, and cost-effective. Their site could now handle any level of traffic without performance degradation, supporting the goals of their marketing campaigns without risking outages or reputational damage.

 
 

Recent Posts

See All
RDS database slow - storage layer

In this case study we explore a problem where we tackled performance issues plaguing an enterprise application responsible for processing...

 
 
bottom of page