1
Feature Story
GitHub - unum-cloud/uform: Multi-Modal AI inference library for Multi-Lingual Text, Image, and Video Search, Recommendations, and other Vision-Language tasks, up to 5x faster than OpenAI CLIP ๐ผ๏ธ & ๐๏ธ
Aug 18, 2023 ยท github.comUForm provides a range of models with different architectures and languages. The multilingual models were trained on a language-balanced dataset. The library also provides additional tools to calculate semantic compatibility between an image and a text, such as Cosine Similarity and Matching Score. Cosine Similarity is computationally cheap and suitable for retrieval in large collections, while Matching Score captures fine-grained features and is suitable for re-ranking.
Key takeaways
- UForm is a Multi-Modal Modal inference library designed to encode Multi-Lingual Texts, Images, and soon, Audio, Video, and Documents, into a shared vector space.
- It offers three types of multi-modal encoding: late-fusion models, early-fusion models, and mid-fusion models, each with different capabilities and use cases.
- The UForm library is efficient and can be run on various platforms, from large servers to mobile phones, and is available on HuggingFace.
- It also provides tools to calculate semantic compatibility between an image and a text, namely Cosine Similarity and Matching Score.