Panasonic R&D Company of America and Panasonic Holdings, along with researchers from leading universities have introduced “SparseVLM,” a method to improve the speed and efficiency of Vision-Language Models (VLMs). These AI models can understand both visual inputs like images or videos and text prompts. Processing high-resolution visuals increases computational load and slows down response times.
SparseVLM addresses this by selecting only the visual information relevant to the user’s text input, ignoring unnecessary visual tokens. Unlike previous approaches that filtered visuals without considering the prompt, SparseVLM links prompt keywords with corresponding visual data, reducing the amount of information the model needs to process.
The research will be presented at the 42nd International Conference on Machine Learning (ICML 2025) in Vancouver. Panasonic continues to work with universities to develop practical, high-performance generative AI systems for real-world use.
Leave a comment