Pixtral 12B Now Available on Amazon SageMaker JumpStart.

Today, the arrival of Pixtral 12B, an innovative Visual Language Model (VLM), has been announced in the Amazon SageMaker JumpStart catalog. Developed by Mistral AI, this model is designed to handle textual and multimodal tasks, offering practical and exceptional applications in the real world. Its standout capabilities include understanding graphics and figures, answering questions in documents, multimodal reasoning, and tracking complex instructions.

Pixtral 12B stands out for its ability to process images in their original resolution and native aspect ratio, without compromising performance in textual tasks. Its advanced architecture incorporates a vision encoder with 400 million parameters, complemented by a multimodal transformer decoder with 12 billion parameters, ensuring fast and precise inferences.

The model’s availability under the Apache 2.0 commercial license allows companies and startups to access a powerful tool for developing complex multimodal applications. Through its integration with SageMaker JumpStart, users can deploy and access machine learning models in a secure environment with extensive customization options, thus adapting to specific needs.

SageMaker JumpStart, a crucial part of the AWS ecosystem, provides developers with access to high-performance pretrained models, with the ability to deploy them on dedicated inference instances, including those powered by AWS Trainium and Inferentia. Although fine-tuning of Pixtral 12B is not yet enabled, its functionalities already allow tasks such as Optical Character Recognition (OCR), graph analysis, and image-to-code conversion, through intuitive interfaces and the use of SageMaker’s Python SDK.

This release solidifies Mistral AI’s position in the visual language model development sector, while Amazon SageMaker JumpStart continues to affirm its commitment to facilitating access to cutting-edge model architectures. This not only optimizes the deployment of machine learning models for data science experts and ML engineers, but also represents a significant step towards integrating multimodal models into business processes, paving the way for future innovations in artificial intelligence.

via: MiMub in Spanish

Scroll to Top
×