Marqo is an open-source tensor-based search engine that supports multi-modal search. In this article, we will
introduce how to set up your own text-to-image search engine using marqo. The full code is available on Marqo’s github.
In this article, we select 5 images from the coco dataset as examples.
First, we need to run marqo in Docker using the following command. This test is done on a x64 linux machine, for Mac users with M-series chips
please check here.
Now, we can create a new environment and install the Marqo client by:
Open Python and check the installation is successful by:
and you should have the output as:
At the time this article is written, we are using marqo with version 0.0.10.
Now, you can download our examples images from Github. You should have the following directory diagram:
Done, you have finished all the set-up, let’s do the real search!
First, we need to create a marqo index that provides you the access to all the necessary operations, e.g., indexing, searching. You can choose different models and parameters here.
In this case, we just use a very basic setting by specifying the model ,enabling image search, and leaving others as default.
Note: To accomplish this multi-modal search task, we MUST set ”treat_urls_and_pointers_as_imges”: True to enable the multi-modal search feature. As for the model, we need to select a model from CLIP families ("ViT-L/14" in this case).
Now, we need to add the images to the created index, which is a little tricky. Marqo is running in the docker, so it will not be able to access
the local images.
One solution is to upload all the images to Github and access them through urls. This is OK in this case as we only have 5 images. However, if we think big, are you really going to upload and download 1 million images with a larger dataset? I guess the answer is NO, so here is the solution.
We can put the local images in a docker server for easier access from marqo in dock by
With this step, marqo can access you image easily using http request, we just need to tell marqo where the image is:
All the local image are on a docker server for marqo to access now.
Marqo requires the input (which we call documents) as a list of dictionary, we can convert the images into the required format
Add the documents into the previously created index using function add_documents()
Yes, it is just this simple one line of code. And you can check the outputs for the indexing time. If you have CUDA GPU available, and want to speed up indexing, follow this guide to enable CUDA on Marco, and set device=”cuda” in the add_documents call.
Done, all the images are in Marqo and now we can search.
Finally, let us search and see the returned the results.
Let’s say we want to get the image “A rider on a horse jumping over the barrier”. Here is the code.
Done, we get the result in just 0.36s without the help of GPU! So what do the results look like, then?
Well, very hard to understand? Don’t worry, let’s plot it and verify it with your eyes:
Isn’t this the image you are look for, “A rider on a horse jumping over the barrier”? Searching image using text is just so simple.
You must be thinking these are just 5 images. What will happen in a larger dataset?
Why not try it yourself, you can easily change the parameters in the code, add more images into the directory, and test your search results.
You can also check other advanced usages in our Github.
It is really easy to use marqo to achieve multi-modal searching, e.g., image-to-text, text-to-image, image-to-image, with the following steps:
1. Environment setup: conda, pip
2. Create index: create_index()
3. Add documents into the index: add_documents()
4. Search: index().search()