Enabling AWS CloudHSM

Overview

 

When thinking about the design and processing of data within an application, it is incredibly important that any given solution provides a high level of protection against data leakage and compromise. From payment methods such as credit card details to personal customer information and business data, organisations that handle confidential information need to ensure that they’re fully protected.

An important part of the puzzle when it comes to providing robust and secure encryption is a Hardware Security Module, or HSM for short. A HSM is a device tailored to providing an exceptionally high level of security in line with data compliance regulations to businesses in a variety of industries that need to safeguard their data.

Typically, in the past a HSM would be a physical device installed within a data centre that an organisation must manage, maintain and configure. In the ever-growing world of cloud adoption however, many cloud service providers also provide HSMs as a managed service, allowing organisations to consume and reap the benefits provided by a HSM in a cloud environment, all whilst cutting down on the operational overhead and cost; for example:

During a recent client engagement involving the migration of a major, high security authentication platform to the cloud, we were given the challenge of moving all components from their on-premises, data centre-based topology to a cloud-native solution hosted exclusively within AWS. As this solution included the use of HSM devices, we were posed with not only providing the technical implementation of the AWS CloudHSM product, but also the integration with pre-existing organisational requirements such as:

  • Compliance and auditing controls

  • Evidenced segregation of duty between HSM activities

  • Robust, role-based access to the HSM and any desired interactions

As this was to be the first implementation of a cloud-based HSM solution within the organisation, we would also be responsible for guiding the solution through the required governance and security processes, enabling the use of AWS CloudHSM at the wider organisational level.

What was the solution?

Working with the client’s security, compliance and development teams, we designed, developed and implemented a pattern which would allow teams to deploy and consume the AWS CloudHSM service in a repeatable, compliant manner. The solution delivered was broken down into two main components:

  1. A set of well documented and tested infrastructure-as-code components in the form of reusable terraform modules. These modules allow development teams to consume off-the-shelf components within their application build which provide well-known, pre-approved configurations of resources. This component greatly reduces the development time in regards to the HSM service, not to mention enabling the ability to rapidly scale, deploy and recreate application infrastructure.

  2. A role-based framework designed to provide restricted and audited “just-in-time” access to CloudHSM devices as required for each HSM interaction. This component enables both the implementation of controls and segregation of duty, in addition to enabling integration with pre-existing organisational requirements and workflows.

The infrastructure created by our reusable terraform modules can be viewed at a very high level as follows:

High Level Infrastructure Diagram

All items highlighted within the “HSM Infrastructure” group define the deployment of the HSM cluster components and connectivity to the required application components.

Here you notice that we have placed the HSM within its own AWS VPC, distinct from any connecting applications. Separating the HSM at the VPC boundary offers a number of benefits, including:

  • Network level segregation from other application components gives us full control over connectivity and later configuration, effectively starting from a well-known state in cases where the connecting application infrastructure has not yet been defined.

  • Deploying the HSM infrastructure as a distinct, self-contained unit without requiring pre-existing network components allows our HSM infrastructure to be created and operated in a lifecycle distinct from any application infrastructure. For example, our HSM infrastructure can be configured and persist outside of any application development environments which are frequently created and destroyed whilst iterating over development lifecycles.

The HSM VPC is connected back to the application infrastructure through the use of a VPC peering connection. Route tables are established allowing communication between the application VPC subnets and the HSM subnets, and further locked down with Subnet NACLs, restricting the type of traffic that can pass over the peering connection. Finally, AWS security groups are used to ensure the only application components with the assigned group can communicate with the HSM infrastructure, and only over the expected ports and protocols.

All items highlighted within the “Management Infrastructure” group define the necessary resources required for our HSM interaction workflows. As described earlier, we were tasked with fitting in with pre-existing customer operational workflows and roles whereby:

  1. HSM instances are built, tracked and maintained by an organisational cryptography function.

  2. Operational usage and interaction with specific HSM functions — such as initialisation, user creation, etc. — are owned and performed by specific business units, whereas application-level interactions and functions are owned and performed by application resources.

In order to service this requirement, we designed and built AWS Systems Manager (or SSM) automation documents to encapsulate each of the defined interactions, including:

  • HSM Creation

  • HSM Initialisation

  • HSM Certificate Signing

  • Management Instance Creation

These documents are deployed for every HSM cluster created via our reusable modules, and wrap the AWS service specific functionality into pre-defined actions which can be carried out by existing organisational teams without having to undertake large amounts of cloud orientated training. In addition to providing consistent workflows, our document framework also enables:

  • Authorisation and role-based access, integrated with pre-existing organisational IAM controls, linked to active directory and single-sign on.

  • Approval and auditing processes via the use of AWS services such as CloudWatch, SSM Automation Approvals, CloudTrail, etc. These services are further linked with the pre-existing organisation Security Information and Event Management (SIEM) platform, collecting and inspecting infrastructure logs and events.

  • Integration with existing organisational systems-of-record for recording the locations and configurations of all HSM instances.

In addition to the AWS API level interactions provided by our SSM Automation framework, our solution also includes the ability to connect to the HSM instances for management tasks — such as user creation — via ephemeral, just-in-time management instances.

These instances are launched on-demand via SSM Automations within an ECS Fargate cluster deployed as a part of the management infrastructure. Each container instance is pre-loaded with the tools required for HSM interaction, including the AWS HSM and Key Management CLI tools and all required Java libraries and dependencies, thus allowing HSM users to operate in a well-known, controlled environment.

Access to management instances is provided by AWS SSM Session Manager, allowing interactive console access, and is again restricted to role based IAM controls, allowing for segregation of duties. HSM management instances also always run with a pre-defined maximum instance lifetime, and upon expiration, automatically shut down, ensuring that interactive access to a HSM is only ever opened on demand and remains closed in all other situations.

Through the use of our HSM solution, our client organisation has been able to achieve the following outcomes:

  • AWS CloudHSM has been unlocked and approved within the organisation, allowing HSM reliant platforms to be migrated to and/or developed within the Cloud at scale.

  • Operational and infrastructure procurement lead times have been greatly reduced, removing blockers posed on application development.

  • All required organisational and compliance requirements have been satisfied and can be continually evidenced within the HSM lifecycle.

  • All required security controls and protections have been satisfied and can be continually evidenced within the HSM lifecycle.
     

Airwalk Reply is a leading AWS Premier Consulting Partner Learn more