How to dramatically improve search speed in Marqo

January 27, 2023
min read
Best practices for achieving lightning fast search speeds in Marqo.

We recently released Marqo, an open-source, tensor-based search engine. Marqo allows you to seamlessly integrate the latest machine learning models from HuggingFace and OpenAI directly into your search experience and achieve cutting-edge search latency at scale.

If you want to run Marqo in production, I recommend Marqo Cloud, which provides a fully managed service with a full suite of benefits such as multi-AZ, durability, access control and observability.

Whether you’re using Marqo open-source or Marqo cloud, in this article I include a number of recommendations to help you achieve the fastest search speeds when using Marqo. Let’s jump into it!

Decrease the number of searchable_attributes

The total number of searchable_attributes in your query will impact the speed of your query. By default, every attribute is treated as a searchable attribute.

In Marqo, tensor indexes are stored in memory as a collection of HNSW indexes. Each attribute of a document has its own HNSW index.

Depending on the hardware you are using, having many searchable_attributes can cause some attributes to be loaded out of memory in replacement of other attributes at query time, which can lead to a significant slow down in search time.

For the full search API documentation (including REST) check here.

The following query only searches the “title” attribute of the documents in the index.

We also recommend that you set any fields that you don't want to apply tensor search to as non_tensor_fields at indexing time, to improve indexing throughput and minimise disk usage. You can still apply filtering and lexical search to non-tensor-fields.

Decrease the number of attributes_to_retrieve

By default, marqo will retrieve all attributes in any documents that match the query.

Therefore, if you have attributes that contain long lengths of text, we recommend you set attributes_to_retrieve not to include long text attributes, unless you truely want to retrieve them as the quantity of data transfer can have a significant impact on the time taken to complete the search.

Note that even if you don’t return an attribute, it can still be searched and the relevant component will be returned in the highlights (which are returned by default). Link to search docs here.

You can combine searchable_attributes and attributes_to_retrieve to return different attributes to the ones you are searching:

Co-locate your resources in the same region as Marqo

Marqo Cloud is working on developing multi-region capability, but at the time of writing Cloud our services are located only in North Virginia (us-east-1 on AWS).

Network latency is significantly decreased when you locate your resources either in us-east-1 itself, or in a Google or Azure data centre close by.

Avoid requesting more than limit=10 results; where possible, use pagination instead

Marqo performs significantly faster when you limit the number of results that you request at any one time. Pagination is available following Marqo 0.0.11.

First page:

Second page:

Use rerankers with caution

Using larger rerankers can significantly improve results and provide benefits like improved image highlighting. However, the reranker must also compute over all returned results. Therefore the latency increase from a reranker is as follows:

total additional latency from reranker = “time taken for the reranker to perform a single inference operation” * len(searchable_attributes) * limit

I recommend consulting the reranking documentation before utilising a reranker, and performing testing on your own data to test the latency before integrating it into an application.

You can integrate a reranker at search time:

Consider using smaller or more performant models

When choosing a model in Marqo, remember to consult our models reference. Search speed is impacted by the model used for inference.

You can set a specific model when creating the index as follows:

Use a GPU machine to run Marqo

When using Marqo, using a GPU will provide the lowest latency in the majority of cases, particularly when using Marqo with images. Note that in order to start Marqo on a GPU machine the starting command is slightly different (detailed instructions here):

Once Nvidia docker is installed (as per the above link), you need to run the following to start Marqo — note the gpus=all flag:

Then, if you want to take advantage of the GPU speed up, you need to specify cuda as the device when searching. Note that for text, depending on the model you may find the CPU faster at search time, in which case you can use the default device=”cpu”.

Summing up

Thanks for reading! If you’re interested to keep up to date with Marqo you can join our Slack group here. or join the mailing list on our website at

If you’re interested in using Marqo cloud you can sign up for our beta wait list here.

Tom Hamer