Amazon Web Services (AWS) is committed to supporting the development of cutting-edge generative artificial intelligence (AI) technologies by companies and organizations worldwide. As part of this commitment, AWS Japan announced the Large Language Model Development Support Program (LLM Program), through which they have had the privilege to work alongside some of Japan’s most innovative teams. From startups to large global companies, these pioneers are harnessing the power of Large Language Models (LLM) and base models to boost productivity, create differentiated customer experiences, and drive significant progress in a variety of industries using AWS generative AI infrastructure. It is worth noting that 12 out of the 15 organizations successfully participating in the program utilized AWS Trainium’s powerful computing capabilities to train their models and are now exploring AWS Inferentia for inference. Earlier this year, at the conclusion of the program, a press conference was held where several pioneering companies presented their results and experiences. Here we share a summary of those results and analyze how participating organizations used the LLM Program to accelerate their generative AI initiatives.
Since its inception, the LLM Program has welcomed 15 diverse companies and organizations, each with a unique vision of how to use LLMs to advance progress in their respective industries. The program provides comprehensive support through guidance to ensure high-performance computing infrastructure, technical assistance and troubleshooting for distributed training, cloud credits, and support for market deployment. It also facilitated collaborative knowledge exchange sessions, where leading LLM engineers gathered to discuss the technical complexities and business considerations of their work. This holistic approach enabled participating organizations to rapidly advance their generative AI capabilities and bring transformative solutions to market.
Ricoh, one of the participating companies, acknowledged that the development of LLMs in Japanese lagged behind LLMs in English or multilingual languages. To address this, the company’s Digital Technology Development Center developed a bilingual Japanese-English LLM using a carefully designed curriculum learning strategy. Takeshi Suzuki, Deputy Director of the Digital Technology Development Center, explains that while new model architectures for base models and LLMs are quickly emerging, they focused on refining their training methodologies to create a competitive advantage, rather than exclusively pursuing architectural novelty.
Stockmark, another program company, aimed to build highly reliable LLMs for industrial applications and decided to pretrain a Japanese LLM to tackle the issue of hallucination, i.e., the production of inaccurate content. To do so, they used a large amount of Japanese textual data, including public data and proprietary data from commercial domains.
The NTT group, in collaboration with Intel and Sony, is developing the high-performance, lightweight tsuzumi LLM, which enhances the quality and quantity of training data in Japanese without increasing parameter size. This model demonstrated high proficiency in the Japanese language and multimodal capabilities that are in progress.
The program also fostered the development of domain-specific models, multimodal models, and linguistically diverse models. For example, KARAKURI developed an LLM for customer service chatbots, while Watashiha created a humor-focused model called OGIRI. Preferred Networks developed an overview model that can integrate and process textual and visual information.
In conclusion, AWS’s LLM Program in Japan has proven to be a success, with participating organizations making significant strides in their generative AI capabilities and finding new applications for these technologies in the real world. This highlights AWS’s commitment to fostering innovation and progress in the field of artificial intelligence, both in Japan and globally.
via: MiMub in Spanish