General
08/15/2025

Optimization of Salesforce Model Endpoints with AI Inference Components in Amazon SageMaker.

Here’s the translation of your text into American English:

Salesforce and Amazon Web Services (AWS) have announced a collaboration aimed at optimizing the deployment of artificial intelligence models, highlighting the importance of large language models (LLMs). The Model Serving team at Salesforce focuses on developing and managing services that facilitate the integration of machine learning algorithms into critical applications, seeking to create a robust infrastructure for these purposes.

One of the major challenges faced by this team is achieving efficient deployment of the models, ensuring optimal performance and effective cost management. This process is particularly complicated by the wide variety of model sizes and performance requirements, which can range from a few gigabytes to up to 30 GB.

Experts have identified two key challenges. First, it has been observed that larger models tend to be used inefficiently, resulting in suboptimal use of multi-GPU instances. On the other hand, mid-sized models require low-latency processing, but this translates into higher costs due to resource over-provisioning.

To address these issues, Salesforce has adopted inference components from Amazon SageMaker, allowing the deployment of multiple base models on a single SageMaker endpoint. This strategy facilitates more detailed control over the number of accelerators and the memory allocated to each model, thereby improving resource utilization and reducing associated costs.

The implementation of these components not only optimizes GPU usage but also allows models to scale independently according to the specific needs of each application. In this way, immediate deployment issues are addressed, and a flexible foundation is established to support the evolution of Salesforce’s artificial intelligence initiatives.

Thanks to these solutions, the company has achieved significant reductions in its infrastructure costs, realizing savings of up to 80% in deployment expenses. Additionally, this optimization benefits smaller models, enabling them to access high-performance GPUs, which enhances their performance and reduces latency time without incurring excessive costs.

Looking ahead, Salesforce plans to leverage the continuous update capabilities of these inference components, allowing them to keep their models up to date more efficiently, minimizing operational burdens and fostering the integration of future innovations into their artificial intelligence platform. With these actions, Salesforce positions itself for continued growth and expansion of its artificial intelligence offerings, while always maintaining high standards in efficiency and cost-effectiveness.

Source: MiMub in Spanish

Optimization of Salesforce Model Endpoints with AI Inference Components in Amazon SageMaker.

Last articles

Sure! The translation would be: “Securing AI at Scale: Introducing Amazon Bedrock AgentCore Identity.”

Announcement of the Windows 11 Insider Preview Build 26200.5751 in the Dev Channel.

Automatic Development of a RAG Conversational Assistant on Amazon EKS and NVIDIA NIMs

Collaboration Between Trump’s First DOJ and Congress to Modify Section 230

Announcement of the Windows 11 Insider Preview Build 26120.5751 in the Beta Channel.

Related articles

Sure! The translation would be: “Securing AI at Scale: Introducing Amazon Bedrock AgentCore Identity.”

Development of a Scalable and Containerized Web Application on AWS with the MERN Stack and Amazon Q Developer – Part 1

Announcement of the Windows 11 Insider Preview Build 26200.5751 in the Dev Channel.

Automatic Development of a RAG Conversational Assistant on Amazon EKS and NVIDIA NIMs

Collaboration Between Trump’s First DOJ and Congress to Modify Section 230

Announcement of the Windows 11 Insider Preview Build 26120.5751 in the Beta Channel.

Exploring Amazon Bedrock AgentCore Gateway: Revolutionizing Enterprise AI Tool Development

Optimization of Salesforce Model Endpoints with AI Inference Components in Amazon SageMaker.

DECORATION

TECHNOLOGY

LIFESTYLE

MIX

LOCAL MEDIA