Sentence transformers are revolutionizing the field of artificial intelligence, especially in natural language processing. These deep learning tools are capable of converting sentences into high-quality vectors, encapsulating their semantic meaning to facilitate tasks such as text classification, semantic search, and information retrieval.
An interesting study has revealed how Amazon is using these transformers to optimize product classification in its extensive catalog. This analysis compared the performance of two sentence transformers in categorizing Amazon products: the public Paraphrase-MiniLM-L6-v2 transformer and the M5_ASIN_SMALL_V2.0 model, developed by Amazon and based on BERT. The latter model has been fine-tuned using specific data from Amazon’s catalog, such as titles, bullet points, and descriptions.
The initial hypothesis of the study suggested that the M5 model would demonstrate superior performance, due to its training with the company’s own data. To test this theory, an experiment was conducted by fine-tuning the sentence transformers with a 2020 Amazon dataset, which included detailed product descriptions, categories, and technical specifications.
The key to improving classification accuracy lay in the meticulous preprocessing process, which normalized texts and defined the main product categories. In addition, relevant fields were selected to ensure accurate classification. An XGBoost classifier was used to evaluate the ability of the fine-tuned models to classify products into their respective categories.
The results of the study were significant. Initially, with the Paraphrase-MiniLM-L6-v2 transformer, a classification accuracy of 78% was achieved. However, once fine-tuned, this model reached an accuracy of 94%. As for the M5_ASIN_SMALL_V2.0 model, it initially had similar accuracy to the Paraphrase-MiniLM-L6-v2, but after tuning, its accuracy increased significantly to 98%.
These results highlight the effectiveness of fine-tuning sentence transformers with specific Amazon data, achieving substantial improvements in the accuracy of product category classification. This advancement not only optimizes categorization within e-commerce, but also opens up new opportunities for the implementation of more precise artificial intelligence solutions in the sector.
via: MiMub in Spanish