Tensor search is a new method of information retrieval. It is based on deep-learning models that learn from data about its content and meaning. This translates to a better understanding of queries and information and greatly improves relevance. Tensor search works for more than text and can be used to search other forms of media like images, movies and audio directly — without using text as an intermediate representation.
Envision an online store that needs to build a search function for customers to find items on its website. Now picture a customer searching for a “long sleeved top”. Although a simplification, it effectively gets broken down into “long”, “sleeved” and “top”. So what gets shown to that customer? At a high level it is all the items which have the terms “long”, “sleeved”, and “top” coded into their metadata. This process of creating and coding metadata can be painstaking and manual but is necessary to make these items searchable. Now what if the customer searched for “sleesve”, or “long-sleeve” instead? By “top”, did they mean a sweater, jumper, t-shirt or something else? The search function would have to have rules — often hand crafted — for how to handle these nuances. This gets complex quickly and these systems of exact matching and symbolic rules are never really complete — they need to be rewritten as language evolves. Finally, absent or misconfigured rules can have a jarring effect on end-users. It is not simply that the relevance of results degrades but no results may be returned at all.
Pre-trained deep learning models learn the rules, instead of manually curating them which reduces manual work and improves relevance. Additionally, the models are not confined to just working with text but can work over any media type like images, videos and audio. Take the previous example where we have results for “long sleeved top”. Imagine now a customer has found a result of interest and wants to find visually similar items with some modification — “visually similar to this long sleeved top but with a floral pattern”.
As seen in the animation above, tensor search permits this intuitive search paradigm. An initial search yields relevant results using images alone. The search is then easily refined using a combination of natural language and images — all without the need for metadata. A subtle result of this is users can now navigate a catalogue of inventory fluently by writing naturally and selecting things they like. There is no reliance on the website owner to maintain hand-crafted and hard to manage ontologies for their inventory. These concepts are also learnt by the models that power tensor search.
As seen in the example above, with tensor search users aren’t limited to the keyword standard for search. They can search with questions, related terms or with images, audio or videos directly (or any combination thereof). Users can search the way they think.
Tensors are generalisations of vectors and matrices and can represent high dimensional data. For example, a single entry in a database could consist of characters, words, sentences, paragraphs, pages or books. Each scale of these datum requires its own representation and needs to be flexible enough to accommodate all the variations. Our approach is to represent these data and associated queries not by just a single vector, but by collections of these — tensors.
Tensors provide a rich representation that can be scaled up or down and in concert with the data. They are a generalisation of the vector based representations and provide an effective API for searching across different modalities. For example, in Marqo, tensors are built so that components of the tensor can be associated with specific parts of a document, image, or video. Not only can this improve search relevance, but it can provide other key information like localisation and explainability (“why did this result get returned?”).
Take the following example of searching an encyclopaedia — it has entries for many different things and each entry itself can contain any or all of short text, documents, images, or videos. When a user queries this database, it should be able to work across all these forms of data and return the most relevant one. This is illustrated in the figure below where the query is not only compared against entire entries, but against each field, and even to specific locations within the field.
Operations can be performed over these to return different results depending on the use case. For example, “give me the most relevant sentence” from some text vs “give me the most relevant passage”. Tensors provide a common interface to search between modalities. For example, using natural language to search images is enabled by these representations of the data — as has been made famous by CLIP. Since speed is critical, pre-filtering and matching queries to a corpus is facilitated by efficient and robust algorithms like Hierarchical Navigable Small Worlds (HNSW).
Although we have spent considerable time espousing the benefits of tensor search, traditional search methods still represent an important function — such as finding exact matches in text. Combining these methods with tensor search allows us to have the best of both worlds through hybrid search. Not only that, hybrid search can further improve relevance and robustness in the same way ensembling does in machine learning. For these reasons, such functionalities are still a part of Marqo.
Features of Marqo include:
Sounds interesting? We are always looking for people to join us! Head to our jobs page with your Github/experience/links to past projects.