Training Llama 3.3 Swallow: A Sovereign Japanese LLM on Amazon SageMaker HyperPod

Sure, here’s the translation into American English:

In a significant advancement in the field of artificial intelligence, the Tokyo Institute of Science has successfully completed the development of Llama 3.3 Swallow, a language model with 70 billion parameters and advanced capabilities for processing the Japanese language. This project, led by Kazuki Fujii, was carried out using Amazon SageMaker HyperPod, an infrastructure that optimizes performance for Japanese language tasks, surpassing well-known models like GPT-4o-mini.

Llama 3.3 Swallow is based on the architecture of Meta Llama 3.3 but incorporates specific improvements that make it more suitable for processing Japanese. This initiative is the result of a collaboration between the Okazaki Laboratory and the Yokota Laboratory at the Tokyo Institute of Science, along with the National Institute of Advanced Industrial Science and Technology (AIST). The model is available in two variants through the Hugging Face platform, making it accessible to researchers and developers interested in exploring its capabilities.

The training process was conducted with continuous pre-training, using a dataset known as Swallow Corpus Version 2. This corpus, extracted from the web, focuses on educational content in Japanese, ensuring a high level of quality in the data used for training. To achieve this, 32 Amazon EC2 instances equipped with powerful GPUs were utilized, allowing for continuous training that lasted over 16 days.

The results obtained show that the base model outperforms several competitive alternatives, excelling in linguistic tasks in Japanese. In particular, its instruction-tuned variant has demonstrated exceptional performance on the Japanese MT-Bench, an important benchmark for evaluating practical applications in this language.

The model’s availability on Hugging Face is subject to the usage licenses of Meta Llama 3.3 and Gemma, promoting innovation in AI applications centered around the Japanese language. The training infrastructure has been designed to be scalable and efficient, combining computing, networking, storage, and monitoring components, facilitating faster training with fewer interruptions.

Additionally, a systematic approach has been implemented for resource optimization, along with a comprehensive monitoring system that enables real-time detection of processing issues. These developments are planned to be released as open-source projects, offering valuable resources to the AI research community.

With the success of Llama 3.3 Swallow, the team at the Tokyo Institute of Science aims to further enhance the model’s capabilities in the future and explore new applications across various areas of technology and communication.

Source: MiMub in Spanish

Scroll to Top
×