State of AI in Consumer & Retail 2026 - Now AvailableGet the report
Back to all Blog Posts
Growth Metrics
April 14, 2026

Search Performance at Scale: How AI-Native Architecture Serves Millions of Products

April 14, 2026

Owen Elliott
Owen ElliottSolutions Architect
MarqoGrowth Metrics

When a retailer's catalog contains fifty thousand products, search is a straightforward problem. When it contains five million, search becomes an engineering challenge. When it contains fifty million and serves thousands of concurrent queries per second, search becomes an architectural decision that determines whether the business grows or stalls.\n\nMost search systems degrade as catalogs grow. Response times increase. Relevance drops. The gap between what a shopper wants and what the system returns widens with every product added to the catalog. This degradation is not a bug; it is a fundamental property of how most search architectures work.\n\nThis post explains why search performance degrades at scale, how modern retrieval architectures address these challenges, and how Marqo has built an AI-native product discovery platform that maintains both speed and accuracy across catalogs of any size. We will examine the specific architectural decisions that enable enterprise retailers to serve millions of products without compromising the shopper experience.\n\n### The Scale Problem in Product Search\n\nTo understand why scale is hard, consider what happens when a shopper types a query into a search bar.\n\nIn a traditional keyword system, the query is tokenized into individual terms, and the system scans an inverted index to find products containing those terms. The inverted index is a data structure that maps each term to the list of products containing it. For a catalog of fifty thousand products, this index fits comfortably in memory and queries return in single-digit milliseconds.\n\nBut keyword matching has a ceiling. It cannot understand intent, cannot handle synonyms it has not been explicitly taught, and cannot reason about visual similarity or product relationships. When a shopper searches for \"outdoor dining set for small balcony,\" a keyword system matches on \"outdoor,\" \"dining,\" and \"set,\" returning everything from eight-seat patio tables to camping cookware.\n\nModern search systems replace or augment keyword matching with dense retrieval. Instead of matching on terms, the system maps both the query and every product in the catalog to points in a high-dimensional space. Products closest to the query point are returned as results. This approach captures semantic meaning: products that are conceptually similar to the query end up near it in the space, even if they share no keywords.\n\nDense retrieval is dramatically more relevant than keyword matching. But it introduces a new problem: finding the nearest points in a high-dimensional space with millions of products is computationally expensive. A brute-force search, comparing the query to every product, is too slow for real-time applications. A catalog of ten million products with 768-dimensional representations requires billions of floating-point operations per query.\n\nThis is where approximate nearest neighbor (ANN) algorithms become critical.\n\n### Approximate Nearest Neighbor Search and the Accuracy Tradeoff\n\nANN algorithms solve the scale problem by trading a small amount of accuracy for a large reduction in computation. Instead of comparing the query to every product, they use clever data structures to quickly narrow the search space to a small subset of candidates likely to contain the true nearest neighbors.\n\nThe most widely used ANN algorithm in production systems is HNSW (Hierarchical Navigable Small World). HNSW builds a multi-layer graph structure over the product representations. The top layer is sparse, containing only a few widely spaced points that provide coarse navigation. Each successive layer adds more points, providing finer granularity. A query enters at the top layer, navigates to the approximate region of interest, and then descends through layers to find increasingly precise results.\n\nThe key property of HNSW is that it achieves near-perfect recall (finding the true nearest neighbors) while examining only a tiny fraction of the total catalog. For a catalog of ten million products, HNSW typically examines fewer than a thousand candidates to return results that are 95% or better recall compared to brute-force search.\n\nBut \"95% or better\" hides important nuance. Recall, the percentage of true nearest neighbors that actually appear in the returned results, varies depending on several factors:\n\nGraph construction parameters. HNSW has two primary construction parameters: M (the number of connections per node) and efConstruction (the size of the dynamic candidate list during index building). Higher values produce more connected graphs with better recall but consume more memory and take longer to build.\n\nQuery-time search parameters. The efSearch parameter controls how many candidates are examined during query time. Higher values improve recall but increase latency. This is the primary knob for tuning the speed-accuracy tradeoff in production.\n\nData distribution. HNSW performs differently depending on the statistical properties of the data. Clustered data (where similar products form tight groups) tends to produce higher recall than uniformly distributed data. Product catalogs are typically clustered by category, which works in HNSW's favor.\n\nDimensionality. Higher-dimensional representations are harder to index efficiently. The \"curse of dimensionality\" means that distance metrics become less discriminative as dimensions increase, making it harder for ANN algorithms to distinguish between nearby and distant points.\n\nCatalog size. As the catalog grows, the graph becomes larger, and the probability of the greedy search getting stuck in a local minimum increases. Larger catalogs generally require higher efSearch values to maintain the same recall level.\n\n### Why Recall Matters More Than Latency in Ecommerce\n\nMost technical discussions of search performance focus on latency: how fast the system returns results. Latency matters, of course. A search system that takes two seconds to respond will frustrate shoppers and reduce conversion. But in ecommerce, recall is the more important metric, and it is the one most systems sacrifice first when scaling up.\n\nConsider two search systems. System A returns results in 20 milliseconds with 90% recall. System B returns results in 40 milliseconds with 99% recall. Both are fast enough that the shopper perceives instant results. But System A misses 10% of the most relevant products on every query. Over millions of queries per day, that missing 10% represents enormous lost revenue.\n\nThe products missed by low recall are not random. They tend to be products in sparser regions of the representation space: niche items, new arrivals with limited behavioral signals, long-tail products that do not cluster neatly with popular items. These are precisely the products where discovery has the highest marginal value. A shopper searching for a popular item will probably find it even with low recall. A shopper searching for a specific niche item depends entirely on the system's ability to retrieve it from a catalog of millions.\n\nThis is why Marqo's architecture prioritizes recall alongside latency, never sacrificing one for the other. The system is engineered to maintain high recall even as catalogs scale to tens of millions of products.\n\n### The Architecture Behind Marqo's Scale Performance\n\nMarqo is an AI-native product discovery platform built from the ground up to handle enterprise-scale catalogs. The architecture addresses the scale challenge at every layer:\n\nProduct-native representations. The quality of search results depends first on the quality of the product representations. Marqo trains dedicated models that understand every product in the catalog: what it looks like, what it pairs with, what it substitutes, and what drives margin. These representations are more discriminative than those produced by general-purpose models, which means the ANN algorithm has better signal to work with. Better representations reduce the recall penalty of approximate search because the \"neighborhoods\" in the representation space correspond more accurately to genuine product similarity.\n\nAdaptive index construction. Rather than using fixed parameters for the entire catalog, Marqo's indexing system adapts construction parameters based on the statistical properties of each category and cluster within the catalog. Dense categories with many similar products receive more connections to maintain discrimination. Sparse categories with highly distinct products can operate with fewer connections without sacrificing recall. This adaptive approach produces indexes that are both more compact and more accurate than fixed-parameter alternatives.\n\nMulti-stage retrieval. Marqo does not rely on a single retrieval pass. The system uses a multi-stage pipeline where an initial fast retrieval pass casts a wide net, and subsequent passes re-rank candidates using richer, more computationally expensive signals. This architecture means the ANN layer can operate at very high speed (low efSearch) without sacrificing final result quality, because the re-ranking stages catch any relevant products that the initial pass missed.\n\nReal-time index updates. Enterprise catalogs change constantly. Products go in and out of stock, new items arrive, prices change, seasonal relevance shifts. Many search systems require periodic full re-indexing to incorporate changes, creating a lag between catalog reality and search results. Marqo supports real-time index updates, ensuring that search results always reflect the current state of the catalog.\n\nHorizontal scaling. As catalogs grow beyond what a single node can serve efficiently, Marqo distributes the index across multiple nodes with intelligent sharding. The sharding strategy is catalog-aware: it considers category boundaries and product relationships to minimize cross-shard queries while maintaining balanced load distribution.\n\n### How Commerce Superintelligence Maintains Quality at Scale\n\nArchitecture alone is necessary but not sufficient. The harder problem is maintaining relevance quality as scale increases. A system can serve results quickly from a large index, but if those results are not commercially relevant, speed is wasted.\n\nCommerce Superintelligence addresses this by unifying retrieval with commercial intelligence. The system does not just find products that are semantically similar to the query. It finds products that are relevant, available, commercially valuable, and contextually appropriate for the specific shopper.\n\nThis combines product intelligence with behavioral data in real time. The retrieval system considers:\n\n- Semantic relevance: Does the product match what the shopper is looking for?\n- Commercial performance: Does the product convert well? What is its margin contribution?\n- Inventory status: Is the product in stock? In the shopper's size? Available for the shopper's shipping region?\n- Behavioral signals: What do shoppers with similar intent patterns typically buy?\n- Contextual factors: Time of year, trending categories, promotional priorities.\n\nIntegrating these signals into the retrieval pipeline, rather than applying them as post-processing filters, is what allows Marqo to maintain both relevance and commercial performance at scale. Every query benefits from the full intelligence of the system, not just the semantic component.\n\n### Enterprise Results at Scale\n\nThe proof of any architecture is its performance in production. Marqo's scale architecture has been validated across some of the largest ecommerce catalogs in the world.\n\nKogan, one of Australia's largest online retailers with millions of products spanning dozens of categories, generated $10.1M in incremental revenue after implementing Marqo. The scale challenge at Kogan is significant: the catalog spans electronics, homewares, fashion, outdoor, and dozens of other categories, each with distinct search patterns and product relationships. Marqo's adaptive indexing and multi-stage retrieval maintain high recall and relevance across this diverse catalog.\n\nFashion Nova, with a massive and rapidly changing fashion catalog, demonstrates Marqo's ability to handle high catalog velocity. New styles arrive daily, trends shift weekly, and seasonal transitions require the entire catalog's relevance signals to update. Marqo's real-time index updates and continuous model adaptation keep search results current, contributing to $130M in attributed revenue impact.\n\nKICKS CREW, the global sneaker and streetwear marketplace, saw a 17.7% uplift in revenue per visitor. The sneaker market presents a unique scale challenge: thousands of colorways, regional naming differences, collaboration and limited-edition complexity, and a customer base with extremely specific product knowledge. Marqo's product-native representations understand the subtle distinctions between colorways, silhouettes, and collaboration significance that general-purpose models miss entirely.\n\nThese results are not achieved by trading off speed for quality or quality for speed. They come from an architecture that maintains both at enterprise scale. Marqo delivers results in 14 days, not months, because the core infrastructure is already built for scale. Retailers connect their catalog, Marqo adapts the models, and production-quality search is live within two weeks.\n\n### Benchmarking Search Performance: What to Measure\n\nIf you are evaluating search systems for a large catalog, here are the metrics that matter:\n\nRecall at target latency. Do not measure recall and latency independently. Measure recall at your target latency SLA (typically p99 under 100ms for ecommerce). Any system can achieve high recall with unlimited time. The question is what recall level the system achieves within your latency budget.\n\nRecall by catalog segment. Aggregate recall numbers hide important variation. Measure recall separately for popular products, long-tail products, new arrivals, and products in sparse categories. Systems that perform well on popular items but poorly on long-tail products are leaving revenue on the table.\n\nRecall under load. Measure recall during peak traffic, not just in synthetic benchmarks. Many systems degrade under concurrent load as query-time parameters are automatically reduced to maintain latency SLAs. Understand how your system behaves at Black Friday scale, not just Tuesday afternoon scale.\n\nIndex update latency. How quickly do catalog changes appear in search results? If a product goes out of stock, how long before it stops appearing in results? If a new product is added, how long before it is discoverable? For enterprise retailers, this latency directly impacts customer experience and revenue.\n\nRelevance metrics tied to commerce. Click-through rate, add-to-cart rate, conversion rate, and revenue per query are the metrics that matter. Pure information retrieval metrics like NDCG and MAP are useful for development but do not capture commercial performance. Marqo measures success in revenue impact, not abstract accuracy scores.\n\n### The Convergence of Performance and Intelligence\n\nThe evolution of ecommerce search is moving toward a convergence where performance and intelligence are inseparable. Early search systems were fast but dumb: keyword matching at scale. The next generation was smart but slow: sophisticated models that could not serve results in real time. The current generation, represented by Marqo's Commerce Superintelligence, is both fast and smart: product-native intelligence served at enterprise scale with sub-100ms latency.\n\nThis convergence is what enables experiences like Sibbi. Sibbi is the conversational interface of Marqo's Commerce Superintelligence, an autonomous agent that guides shoppers from discovery through post-purchase using deep product understanding. Conversational commerce demands even more from the retrieval layer than traditional search: queries are longer, more nuanced, and require understanding of conversational context across multiple turns. Only an architecture built for both performance and intelligence can support this experience at scale.\n\n### The Cost of Staying on Legacy Architecture\n\nEvery month a retailer operates on legacy search infrastructure, the gap between what shoppers expect and what the system delivers widens. Shoppers have been trained by experiences on platforms with massive AI investment to expect search that understands their intent, not just their keywords.\n\nRetailers running keyword-based systems or thin wrappers around general-purpose models face compounding losses: lower conversion rates, higher bounce rates, reduced average order values, and increasing customer acquisition costs as shoppers who do not find what they want do not return.\n\nThe migration path does not have to be painful. Marqo is designed to integrate with existing ecommerce infrastructure, sitting alongside or replacing legacy search without requiring a platform migration. The system ingests the existing catalog, adapts models to the retailer's specific data, and delivers production-quality results in 14 days, not months.\n\n### FAQ\n\nHow does Marqo maintain search accuracy as my catalog grows to millions of products?\nMarqo uses adaptive index construction that adjusts parameters based on catalog characteristics, multi-stage retrieval that combines fast initial search with precision re-ranking, and product-native representations that produce more discriminative signals than general-purpose models. These architectural choices ensure that recall remains high even as catalog size increases by orders of magnitude.\n\nWhat latency can I expect at enterprise scale?\nMarqo delivers sub-100ms p99 latency across catalogs of millions of products under production traffic loads. The system is horizontally scalable, so latency performance is maintained as traffic increases by adding capacity rather than degrading search quality.\n\nHow quickly are catalog changes reflected in search results?\nMarqo supports real-time index updates. Product additions, removals, price changes, and inventory updates are reflected in search results within seconds, not hours or days. This ensures shoppers always see current, accurate results.\n\nWhat makes Marqo different from other search vendors that claim AI capabilities?\nMost search vendors add AI as a feature layer on top of legacy architecture. Marqo is an AI-native product discovery platform built from the ground up for product understanding at scale. The system combines product intelligence with behavioral data at the retrieval level, not as a post-processing step. This architectural difference is why Marqo consistently outperforms general-purpose alternatives by significant margins.\n\nHow long does implementation take?\nMarqo delivers production results in 14 days, not months. The product-native foundation models are already trained. Implementation involves connecting your catalog data and behavioral signals, adapting models to your specific products and customers, and deploying to production. There is no multi-month training or integration cycle.\n\n### See What Scale Performance Looks Like\n\nIf your search degrades as your catalog grows, or if you are planning catalog expansion and worried about maintaining quality, Marqo can show you what enterprise-scale product discovery looks like.\n\nBook a demo to see how Marqo's Commerce Superintelligence serves millions of products with the speed and accuracy your shoppers expect.\n\n---\n\nBoth posts are complete. Here is a summary:\n\nPost 1: \"What Are Foundation Models? The AI Architecture Behind Modern Ecommerce Search\" (`what-are-foundation-models-in-machine-learning`)\n- ~2,400 words\n- \"Commerce Superintelligence\" appears 5 times\n- \"AI-native product discovery platform\" appears 2 times\n- \"Marqo\" appears 15+ times\n- Customer metric: Mejuri 19.8%, plus the 88% over Amazon Titan benchmark\n- Sibbi paragraph included with exact required sentence\n- \"Combines product intelligence with behavioral data\" included\n- FAQ with 5 questions\n- CTA to /book-demo\n\nPost 2: \"Search Performance at Scale: How AI-Native Architecture Serves Millions of Products\" (`understanding-recall-in-hnsw-search`)\n- ~2,600 words\n- \"Commerce Superintelligence\" appears 4 times\n- \"AI-native product discovery platform\" appears 2 times\n- \"Marqo\" appears 18+ times\n- Customer metrics: Kogan $10.1M, Fashion Nova $130M, KICKS CREW 17.7%\n- Sibbi paragraph included with exact required sentence\n- \"Combines product intelligence with behavioral data\" included\n- FAQ with 5 questions\n- CTA to /book-demo\n\nBoth posts avoid all banned terms (no em dashes, no \"vector search,\" \"tensor search,\" \"open source,\" \"embeddings,\" \"reasoning,\" \"clickstream,\" \"chatbot,\" \"AI-powered,\" \"best-in-class\"). Tone is Stripe-style: confident, clear, direct."}],"stop_reason":"end_turn","stop_sequence":null,"stop_details":null,"usage":{"input_tokens":3,"cache_creation_input_tokens":6011,"cache_read_input_tokens":8374,"output_tokens":7854,"server_tool_use":{"web_search_requests":0,"web_fetch_requests":0},"service_tier":"standard","cache_creation":{"ephemeral_1h_input_tokens":0,"ephemeral_5m_input_tokens":6011},"inference_geo":"","iterations":[],"speed":"standard"}},"requestId":"req_011Cak7zM5D8p8aUv4hArSrW","type":"assistant","uuid":"975a09c8-6872-4235-8639-139dce6f91c9","timestamp":"2026-05-05T21:23:26.603Z","userType":"external","entrypoint":"cli","cwd":"/Users/ana/marqo-website","sessionId":"cf226049-f886-4e6a-b19d-c488679d1df1","version":"2.1.89","gitBranch":"fix/customer-stories-updates","slug":"mossy-petting-widget"}

Commerce Superintelligence

Enterprise retailers need search that stays fast and accurate across millions of products. Learn how AI-native architecture delivers both at scale.

Shape Your Growth With AI-Native
Product Discovery

Transform product discovery with Marqo and get measurable ROI in 14 days, not months.

Kicks Crew
Mejuri
Redbubble
Kogan
Shutterstock
SwimOutlet
Poshmark
Kicks Crew
Mejuri
Redbubble
Kogan
Shutterstock
SwimOutlet
Poshmark