Reducing cloud computing/database operational costs and preventing unaccounted compute costs for a large U.S. Government Agency.
After completing a successful cloud migration under a critical deadline, a large U.S. Government Agency discovered their cloud compute and database costs were not being accurately tracked back to a cost center. As a method of discovery, application teams were issued billing codes to tag their resource usage and track cloud computing costs. During this process, the team determined that multiple resources did not have tags set correctly to the appropriate cost center. In the span of one month, $11,000 (9%) of the total cloud computing costs could not be accurately mapped to a specific cost center.
Simple Technology Solutions (STS) engineers investigated the cloud computing cost tracking issue and identified a number of documented solutions from existing Amazon Web Services (AWS) private sector customers. Unfortunately, these solutions were not viable in the GovCloud environment. STS and the Agency customer communicated with AWS personnel to identify the deficiencies in the GovCloud environment and AWS modified the API to address these issues.
STS discovered that a number of legacy applications “lifted and shifted” out of the data center were not able to support the cloud native scaling solutions to manage or reduce costs. The team identified multiple instances within the development infrastructure that were running 24/7 and wasting computing resources. This scenario presented a considerable opportunity to reduce utilization of unused resources and decrease cloud computing costs without affecting cloud infrastructure services.
STS engineers created a solution using serverless Lambda functions to analyze tags on compute and database resources that were mapped to cost centers. The solution also “turned off” improperly tagged resources. A DynamoDB cloud database table stored the list of valid tag values for correct cost centers associated with cloud computing resource usage. An additional Lambda function was created to generate dynamic Identity Access Management (IAM) policies to prevent the creation of cloud resources that did not contain valid tags.
The team defined a tagging schema for development teams to tag non-production cloud resources to start and stop at specified times. The Lambda function was triggered by Cloudwatch every 15 minutes. Cloud management teams were given fine-grained control to only enable resources they needed without having to directly manage their own infrastructure.
STS team members designed and implemented controls to prevent compute and database resources from being created without proper cost tracking tags. The unattributed or unmapped cloud computing cost was reduced to less than $60 (less than 1% of the total cloud computing costs). The STS serverless solution provided responsive and dynamic capabilities to support new cost centers with cloud computing cost controls. Once costs could be accurately tracked back to individual cost centers, development teams utilized automated stop and start tools to manage resource usage. Teams who adopted the auto-start and stop capabilities saved up to 67.26% of their total costs on their Elastic Compute (EC2) CPU usage costs and Relational Database Service (RDS) database instance costs.