Site icon becoration

Traffic Prediction on AWS Backbone Network for Risk Mitigation with GraphStorm

AWS has taken a significant step in managing its global network, which is essential for delivering secure and reliable services in 34 regions and over 600 points of presence on Amazon CloudFront. This infrastructure, spanning 41 Local Zones and 29 Wavelength Zones, ensures high-performance connectivity and ultra-low latency in 245 countries and territories. However, the magnitude and complexity of this network require constant effort in planning, maintenance, and real-time operations.

Managing such an extensive network presents significant challenges, especially when it comes to anticipating the effects of changes in one of its multiple components. Crucial questions arise in this context: can the network handle the existing traffic?, how long will it take before congestion appears?, and where are problems most likely to occur? Efforts to answer these questions are vital to ensure optimal performance and constant availability of services.

To address these challenges, the AWS team has intensified its efforts to improve security mechanisms and risk analysis processes. Through simulations and thorough testing, they seek to ensure the network’s resilience to various scenarios, but the system’s complexity poses risks. Simulations, while useful, face limitations in real-time operations, especially in terms of cost and computation time.

In this context, AWS is focusing on data-driven strategies that allow for scalability without requiring a proportional increase in computation time. A recent development has been the application of the graph machine learning framework, GraphStorm, which has led to promising results in predicting traffic in complex networks. This approach has excelled in routing and load distribution tasks, thanks to its ability to capture the network’s structural information.

In a test conducted on 85 segments of the backbone network for two weeks, the model achieved a surprising accuracy in its predictions, achieving a 13% error margin at the 90th percentile. This not only represents an advancement in operational security but also optimizes daily operations by providing invaluable information about traffic patterns, allowing for more effective congestion risk mitigation.

To ensure continuous improvement in operational security, AWS has designed a system architecture that integrates GraphStorm with several of its services. This facilitates scalable and efficient model training, allowing for frequent updates and seamless integration with existing workflows. Thus, AWS aims to balance meeting its customers’ needs with the secure operability of its infrastructure, committed to continuing to communicate its progress in this new strategy.

via: MiMub in Spanish

Exit mobile version