Translation: Activation of Multimodal Features for Phi Silica.

Here’s the translation to American English:

Microsoft has introduced a new multimodal functionality for its small language model, Phi Silica. This advancement aims to enhance accessibility and productivity on Copilot+ devices using Snapdragon processors, as well as on future models from Intel and AMD. The innovation features visual comprehension capabilities, allowing the model to not only process text but also interpret images, generating descriptions that can be used by assistive technologies such as screen readers.

The update leverages an efficient approach that eliminates the need for a dedicated vision model, thus optimizing resource usage like disk space and memory. Instead, the integration is carried out with existing components, incorporating a projector model with 80 million parameters, ensuring that the system operates effectively without compromising the performance of other established models.

The new multimodal functionality enables the generation of image descriptions with varying levels of detail, which is particularly beneficial for individuals with visual impairments. This advancement does not solely rely on cloud models; it also utilizes local capabilities, providing descriptions that are quicker and more accessible. In real-world tests, an optimized Phi Silica model can deliver short descriptions in about four seconds and more detailed descriptions in around seven seconds.

To assess the quality of the generated descriptions, Microsoft applies methodologies that compare the effectiveness of this new approach to other benchmark models, such as Florence. Results have shown that descriptions generated by Phi Silica are more accurate and comprehensive, thereby enhancing their usefulness for those dependent on these tools.

As this functionality is rolled out, it is expected that more languages will be added to further improve accessibility. With this evolution, Microsoft reaffirms its commitment to making technology more inclusive and accessible for all, especially for those facing barriers in using digital technologies.

Source: MiMub in Spanish

Scroll to Top