Lost SSH key - employee left
- mooneya9
- Mar 1, 2024
- 2 min read
Updated: Jul 22
One common risk in infrastructure management arises when staff depart without completing a proper handover. This can include critical credentials such as SSH keys being stored only on local devices, leaving no administrative access behind. Although this is considered poor practice, it is not an uncommon occurrence.
In one case, a client reported that their CI/CD pipeline had stopped functioning. During investigation, they realised they no longer had SSH access to the EC2 instance responsible for running the process. The original administrator had left the organisation and was the only one who had the SSH key originally.
To restore access, we mounted the root volume from the affected EC2 instance onto a newly provisioned instance as a secondary volume. This allowed us to access and modify the contents of the file system. A new SSH public key was added to the authorized_keys file of the relevant user. Once complete, the volume was reattached to the original instance, and access was successfully restored.
With access regained, attention turned to identifying why the CI/CD tool had stopped working. The application logs showed a cryptic error related to a missing or incompatible library. Cross-referencing this with OS-level logs revealed that the system had recently installed a set of updates through the unattended upgrades feature enabled on the instance.
One of these upgrades had pulled in a newer version of a package used by the CI/CD tool, introducing a compatibility issue. This explained the sudden failure despite no code changes being made to the pipeline itself.
As a short-term workaround, we compiled the required version of the dependency from source and patched it into the system manually. This restored the build process and allowed the team to resume software delivery.
For long-term stability, the client was advised to upgrade the CI/CD tool to a more recent release that supports the newer dependency versions included in the package manager and to remove the temporary compiled package. Without this upgrade, neither the tool nor its libraries would continue to receive official patches - posing a future risk to both functionality and security.
We also recommended disabling unattended upgrades on production systems and replacing them with scheduled maintenance windows. This ensures that all system updates are tested, reviewed, and coordinated - rather than introduced unexpectedly during business-critical operations.
The issue was resolved quickly, restoring a key software delivery process that had high operational impact for the client. With both access and functionality re-established, the client could return to normal development workflows with minimal disruption, in addition to planning for a more permanent solution regarding upgrades.