General
07/31/2024

Monks improves processing speed fourfold for real-time diffusion image generation using Amazon SageMaker and AWS Inferentia2.

Monks, the digital operating brand of S4Capital plc, has stood out globally thanks to its innovation and specialization in marketing and technology services. This time, the company has outdone itself with a project that revolutionizes real-time image generation using the latest advancements in AWS and machine learning acceleration.

Monks took on the challenge of improving real-time image generation, a process that initially presented significant challenges in terms of scalability and cost management. Traditional computing resources were expensive and did not meet the required low latency. This prompted Monks to explore advanced AWS solutions for high-performance computing and cost-effective scalability.

The solution came with the adoption of AWS Inferentia2 chips along with Amazon SageMaker’s asynchronous inference points. These technologies not only promised to improve processing speed four times faster, but also to reduce costs through automatic and managed scaling.

The workflow involved creating inference points, handling requests, processing and storing results in Amazon S3, and completion notifications through Amazon SNS. The ability to automatically scale instances according to demand and reduce instance count to zero during idle periods significantly contributed to cost reduction.

Implementing SageMaker’s asynchronous inference points allowed Monks to efficiently manage varying traffic loads, optimizing resource usage. Monks was able to process an average of 27,796 images per hour per instance, representing performance improvement and a 60% reduction in costs per image.

Using custom scaling policies with Amazon CloudWatch metrics was crucial. Adopting custom metrics allowed Monks to adjust the necessary computing capacity in real-time, optimizing performance and cost. These metrics included inference capacity, number of inference requests, and utilization rate.

Deploying AWS Inferentia2 not only significantly improved inference performance, but also increased efficiency and reduced costs. The optimized model delivered images in 9.7 seconds, meeting the campaign’s strict latency requirements.

In conclusion, this implementation of SageMaker’s asynchronous inference endpoints with AWS Inferentia2 chips not only allowed Monks to efficiently handle different traffic loads, but also drastically reduce costs and improve performance. This approach offers an effective guide for other generative artificial intelligence applications, demonstrating that the combination of these technologies is a robust and cost-effective solution for intensive computational tasks. Monks continues to be a comprehensive digital partner, integrating a wide range of solutions and empowering businesses with more efficient content production, scalable experiences, and AI-driven insights.

via: MiMub in Spanish