Website down - EFS credits
- mooneya9
- Mar 1, 2024
- 2 min read
Updated: Jun 12
Amazon EFS is a network file system frequently used in web server environments where multiple instances need access to shared storage. In many deployments, EFS is configured with the default burst performance mode. This mode allows throughput to temporarily exceed the baseline, drawing from a pool of burst credits that accumulate when usage remains below the baseline threshold. The baseline itself is determined by the total amount of data stored - the more data stored, the higher the sustained throughput.
A critical characteristic of this design is that when an EFS volume is first created, AWS grants a large pool of initial burst credits. This can be misleading, as systems may appear to perform well for months, only to experience a sudden and severe performance drop when the burst credits are exhausted. Once the credit balance reaches zero, the system reverts to baseline performance, which may be insufficient for production workloads - especially if the volume contains only a small amount of data.
In this case, a client reported that their web application had become extremely slow. Suspecting a denial-of-service attack, they sought assistance to investigate the cause. We began by systematically testing each component in the stack: Apache, PHP-FPM, the database, and the EFS mount. All services except EFS responded normally.
Focus then shifted to the EFS metrics available in Amazon CloudWatch. These confirmed the root cause: the burst credit balance had dropped to zero after approximately six months of normal usage. Because the amount of data stored on the EFS volume was relatively small, the baseline throughput was also very low - measured in the hundreds of kilobytes per second. This was insufficient to support the needs of the web application.
To resolve the issue, the EFS performance mode was switched from "bursting" to "elastic." This change provided the necessary throughput without relying on burst credits, ensuring consistent performance regardless of storage size.
The problem was resolved quickly, and the client was relieved to learn that it had not been caused by a security incident. The case demonstrated the importance of understanding EFS performance modes and the hidden risks associated with relying solely on burst credits for sustained workloads.