Managing Containers

Overview

 

Kubernetes is a very popular and powerful tool harnessing containers to run jobs. These containers are normally unique to a specific application and are created and stored ‘in-house’ by organisations, so making sure that the correct container is used to run the correct job is always important.

After all what would happen if the container used to run a job introduced issues or was compromised?

During a recent client engagement to deliver a strategic digital container platform on a global scale, this requirement was surfaced – how do we identify, report and manage container-level vulnerabilities within the applications deployed on the new platform?

Being the first platform of its type in the customer organisation, no guidelines or patterns existed for the use of application containers, meaning the required solution would also form the basis for the enablement and development of the organisation’s wider AWS container security patterns.  Due to the nature of the customer (highly regulated), any solution developed also included the requirement to provide segregation of platform roles and responsibilities, including the enforcement of controls around deployment, promotion and acceptance of images.

Aside from the technical challenges to delivering such a tool, one of its main goals would also be to enable the wider adoption and accelerate migration on to the new strategic container platform through developer focused workflows and early development lifecycle feedback, increasing developer productivity, application quality and the security of the platform as whole.

The solution

 

Working with the client’s security, compliance and development teams, we designed, developed and implemented an AWS based, cloud-first solution, known affectionately as ‘Maria’.

The solution operates as a centralised container storage and scanning solution, extending the standard AWS ECR service to provide a multi-tenanted, application-agnostic platform.

Key features of the solution include:

  • CI/CD driven user workflows for automated image scanning, reporting and promotion
  • Segregation of scanned, approved images, distinct from unscanned or rejected images
  • Enforced controls around the deployment of images into environments based on scan result criteria
  • Centralised vulnerability policy creation and management
  • Integration into pre-existing customer-specific reporting and management tools

From a consumer perspective, the solution offers the following interactions:

  • Application developers are able to push their images to a central ECR location and benefit from automated scanning, reporting and image promotion or rejection based on criteria defined and managed by the organisation’s cyber security function. Provided Jenkins libraries, Terraform modules and well-defined documentation improve the developer experience by enabling CI/CD of images through the image scanning processes, and providing off-the-shelf infrastructure templates to reduce the time-to-start of application teams.
  • Business consumers are able to view, filter and report on the security standpoint and state of vulnerability compliance for all application containers and teams.
  • Security teams are able to define, manage and enforce policies for the acceptance and categorisation of vulnerabilities. Vulnerabilities can be managed and either accepted or rejected at an organisation or team level.

We also needed to develop and codify the AWS ECR security patterns to ensure the solution was compliant from the outset, and that the standards and processes defined for the centralised solution were also implemented consistently across the global developer community.

How does it work?

 

The solution operates from a central AWS ‘solution’ account that application teams interact with.  Within the account, workflow services are provided such that pushing an image automatically triggers the required actions.  Segregation of images is provided so that consumers may only push to “unscanned” repositories, with the system automatically promoting any approved images to “scanned” repositories, which can only ever be read from.

Workflow orchestration is provided through AWS Lambda functions, in combination with CloudWatch Event RulesSNS and SQS for message queueing.

Scanning services are provided through the deployment of Aqua Cloud Native Security Platform (CSP).  This tool is deployed within the solution account inside a private EKS cluster.  The dashboard interface provided by Aqua CSP allows development teams to search, report and export the state of the recorded image scans and for security teams to manage vulnerability policies.

In the event of an image push, the scanning workflows orchestrate ad-hoc Kubernetes jobs within the EKS cluster, scanning the image within an ephemeral scan container.

These scan jobs are monitored by the workflow processes, which further orchestrate ephemeral image promotion jobs to move the image from “unscanned” to “scanned” in the event of success or reject the image in the event of a scan failure.

Throughout the process, SNS event notifications are generated at key events, notifying the consuming team and pipeline processes of events within the system, including scan acceptance, completion, promotion or rejection.

Results from the scans are written to HTML reports and persisted to S3 objects which are shared with the originating consumers, as well as being made available in the Aqua CSP dashboard for security and business users to investigate.  All events, actions, scan results and responses are also logged to DynamoDB audit tables which are used for integration with pre-existing customer security tools.

The Results?

 

  • A conduit from pipeline to cluster that gives us a level of confidence that what we’re putting into the cluster is what we wanted it to be in the first place
  • Great developer feedback, and the pattern is currently being further developed for use across multiple cloud providers.