The customer has 100+ AWS accounts for its portfolio of applications. Each account was configured with at least one VPC, 2 VPN Connections with 2 tunnels each, multiple subnets, Virtual and Customer Gateways, a custom route table, and gateway endpoints. This architecture supports a secure connection back to the client’s on-premise network. Additional VPC peering was implemented on a case-by-case basis.
When provisioning a new account, new VPC, or additional CIDR block association, engineers would consult a shared spreadsheet and manually calculate the new CIDR range and update the spreadsheet. The process was lengthy, error-prone and a perennial pain point for application teams. To support a more efficient and accurate provisioning process, an automated CIDR management solution was needed.
STS engineers determined the pain point the customer was experiencing was the spreadsheet, which required manual updating, and was not always up to date. However, the information it contained- details around CIDR ranges for networks and gateways- could all be accessed through the AWS API. A solution drawing on the canonical record- how things are, rather than how things were last recorded as being- would eliminate much of the confusion.
STS engineers identified tools that could manage these configurations. To automate the solutions without creating additional operational overhead, a serverless approach was called for.
STS proposed a DynamoDB table to track both parent CIDR blocks and CIDRs in use and a serverless, Lambda-based solution for referencing the DynamoDB table and calculating new CIDR ranges. This was designed to be incorporated into the larger provisioning process which was automated through the use of Step Functions.
As network creation events were detected in the Enterprise monitoring service Event Bus, the Lambda function would trigger to add the new CIDR range and other configuration information to the DynamoDB tables. For the initial population of the database, and for periodic quality assurance, the Lambda function could be triggered manually to audit all networks in all the Enterprise’s AWS accounts.
With the Network State Management Engine, the customer was able to deploy new AWS accounts and VPCs much more rapidly- going from a several month process to an average of five days from requesting new networking to compilation of the implementation. Moving to a fully automated process, based on accurate, up to date information eliminated CIDR range conflicts, which caused outages and required time consuming rollbacks.
This solution also embraced AWS serverless and managed technologies, which increased performance and uptime, ensured reliability, maintained good security practices and optimized cost for the customer.
Like what you're reading? Start a conversation by booking a meeting with us today.
If you're looking for best practices and case studies for quickly and securely executing large-scale application migration projects, our free eBook, Proven Strategies for Legacy Application Migration, is the resource for you.