Meta’s Llama 3.2 models are now available on Amazon SageMaker JumpStart.

Today we are excited to announce the availability of Llama 3.2 models on Amazon SageMaker JumpStart. Llama 3.2 offers lightweight multimodal vision models that represent the latest advancement from Meta in large language models (LLM), providing enhanced capabilities and broader applicability across various use cases. With a focus on responsible innovation and system-level security, these new models demonstrate cutting-edge performance across a wide range of industry benchmarks and feature attributes that help build a new generation of artificial intelligence (AI) experiences. SageMaker JumpStart is a machine learning (ML) hub that provides access to ML algorithms, models, and solutions so you can quickly get started with ML.

In this post, we show how you can discover and deploy the Llama 3.2 11B Vision model using SageMaker JumpStart. We also share the supported instance types and context for all Llama 3.2 models available on SageMaker JumpStart. Although not highlighted in this blog, you can also use the lightweight models in conjunction with fine-tuning using SageMaker JumpStart.

The Llama 3.2 models are initially available on SageMaker JumpStart in the US East (Ohio) region of AWS. It is worth noting that Meta has restrictions on the use of multimodal models if you are in the European Union. Please refer to Meta’s Community License Agreement for more details.

Llama 3.2 represents the latest advancement from Meta in LLMs. Llama 3.2 models are offered in various sizes, ranging from small to medium multimodal models. The larger Llama 3.2 models come in two parameter sizes, 11B and 90B, with a context length of 128,000, and are capable of performing sophisticated reasoning tasks, including multimodal support for high-resolution images. The lightweight text-only models come in two parameter sizes, 1B and 3B, with a context length of 128,000, and are suitable for edge devices. Additionally, there is a new Llama Guard 3 11B Vision parameter model, which is designed to support responsible innovation and system-level security.

Llama 3.2 is the first Llama model to support vision tasks, with a new model architecture that integrates image encoder representations into the language model. With a focus on responsible innovation and system-level security, the Llama 3.2 models help you build and deploy cutting-edge generative AI models to drive new innovations, such as reasoning with images, and are also more accessible for edge applications. The new models are also designed to be more efficient for AI workloads, with reduced latency and enhanced performance, making them suitable for a wide range of applications.

SageMaker JumpStart provides access to a wide selection of publicly available base models (FMs). These pretrained models serve as powerful starting points that can be deeply customized to address specific use cases. You can now use cutting-edge model architectures, such as language models, computer vision models, and more, without having to build them from scratch.

With SageMaker JumpStart, you can deploy models in a secure environment. Models can be provisioned on dedicated SageMaker inference instances, including instances powered by AWS Trainium and AWS Inferentia, and are isolated within your Virtual Private Cloud (VPC). This reinforces data security and compliance, as models operate under your own VPC controls, rather than in a shared public environment. After deploying an FM, you can further customize and fine-tune it using the extensive capabilities of Amazon SageMaker, including SageMaker Inference for deploying models and container logs for improved observability. With SageMaker, you can streamline the entire model deployment process.

To test the Llama 3.2 models on SageMaker JumpStart, you need the following prerequisites:

SageMaker JumpStart provides FMs through two main interfaces: SageMaker Studio and the SageMaker Python SDK. This offers multiple options for discovering and using hundreds of models for your specific use case.

SageMaker Studio is a comprehensive IDE that provides a unified web interface for all aspects of the ML development lifecycle. From preparing data to building, training, and deploying models, SageMaker Studio provides tools specifically designed to simplify the entire process. In SageMaker Studio, you can access SageMaker JumpStart to discover and explore the extensive catalog of FMs available for deploying inference capabilities in SageMaker Inference.

To access SageMaker JumpStart in SageMaker Studio, choose JumpStart from the navigation panel or from the homepage.

Alternatively, you can use the SageMaker Python SDK to access and use models from SageMaker JumpStart programmatically. This approach allows for greater flexibility and integration with existing AI/ML workflows and pipelines. By providing multiple entry points, SageMaker JumpStart helps you seamlessly incorporate pretrained models into your AI/ML development efforts, regardless of your preferred interface or workflow.

On the SageMaker JumpStart homepage, you can discover all publicly offered pretrained models by SageMaker. You can select the Meta models provider tab to discover all Meta models available on SageMaker.

If you are using SageMaker Classic Studio and do not see the Llama 3.2 models, update your SageMaker Studio version by turning off and restarting. For more information on version updates, see Turn Off and Update Classic Studio Applications.

You can choose the model card to view details about the model, such as the license, the data used for training, and how to use it. You can also find two buttons, Deploy and Open Notebook, to help you use the model.

By choosing either of these buttons, a popup window will display the End User License Agreement (EULA) and Acceptable Use Policy for you to agree to.

After accepting, you can proceed to the next step to use the model.

Choosing Deploy and accepting the terms will start the deployment of the model. Alternatively, you can deploy through the example notebook by choosing Open Notebook. The notebook provides a step-by-step guide on how to deploy the model for inference and clean up resources.

To deploy using a notebook, you must select a suitable model, specified by the model_id. You can deploy any of the selected models on SageMaker.

You can deploy a Llama 3.2 11B Vision model using SageMaker JumpStart with the following Python SDK code:

from sagemaker.jumpstart.model import JumpStartModel
model = JumpStartModel(model_id="meta-vlm-llama-3-2-11b-vision")
predictor = model.deploy(accept_eula=accept_eula)

This deploys the model on SageMaker with default configurations, including the default instance type and default VPC settings. You can change these settings by specifying non-default values in JumpStartModel. To successfully deploy the model, you must manually adjust accept_eula=True as an argument of the deploy method. After deploying, you can make inferences against the deployed endpoint through the SageMaker predictor:

payload = {
   "messages": [
       Referrer,
       Referrer,
       Referrer,
       via
   ],
   "temperature": 0.6,
    "top_p": 0.9,
    "max_tokens": 512,
    "logprobs": False
}
response = predictor.predict(payload)
response_message = response['choices'][0]['message']['content']

The Llama 3.2 models have been evaluated on over 150 benchmark datasets, demonstrating competitive performance with leading FMs.

In this post, we explored how SageMaker JumpStart empowers data scientists and ML engineers to discover, access, and deploy a wide range of pretrained FMs for inference, including the most advanced and capable Meta models to date. Start today with SageMaker JumpStart and the Llama 3.2 models.

For more information on SageMaker JumpStart, see Train, deploy, and evaluate pretrained models with SageMaker JumpStart and Getting started with Amazon SageMaker JumpStart.

{Source|via|Referrer}: MiMub in Spanish

Scroll to Top