Learn how to implement cost-effective multi-tenant search using Amazon OpenSearch Serverless next-generation architecture with scale-to-zero compute and simplified routing through per-account, regional endpoints.
Building multi-tenant search architectures requires balancing data isolation with operational cost and complexity. In this post, we provide code examples for an implementation of multi-tenant search using a collection-per-tenant model with Amazon OpenSearch Serverless per-account, regional endpoints. Collection-per-tenant provides data and workload isolation. The regional endpoint simplifies routing requests for indexing and searching data.
Amazon OpenSearch Serverless is a serverless deployment option for Amazon OpenSearch Service that simplifies infrastructure management, index tuning, and data lifecycle management. OpenSearch Serverless automatically provisions and scales resources to provide consistently fast data ingestion rates and millisecond query response times during changing usage patterns and application demand.
The multi-tenant search problem
In search workloads, a tenant is a logical unit of data and the queries against that data. An eCommerce site has product categories. Each category is a tenant. A blog-hosting platform has blogs. Each blog is a tenant. Tenants map to resources in different ways. In the siloed model, each tenant gets its own container: a domain, collection, or index. In the pooled model, tenants share a container. The hybrid model silos large tenants and pools smaller ones together. Regardless of model, you need a mapping between tenant identifiers and the containers that hold their data, so your application routes requests correctly.
OpenSearch Serverless classic offered a collection-per-tenant strategy that simplified, but did not remove, the need for maintaining a tenant-container mapping. In addition, the cost structure of maintaining collection-per-tenant in classic was not ideal. Classic shared hardware across collections with the same AWS Key Management Service (AWS KMS) key. Tenants with different keys could not share hardware. The cost of the solution was the minimum monthly collection cost multiplied by the tenant count. Building for hundreds or thousands of tenants was cost-prohibitive. Collection groups improved this by allowing hardware sharing across AWS KMS keys, but compute costs were still driven by your indexed data, even during idle periods.
With the next-generation architecture, collection groups scale compute to zero. You pay for compute only when a tenant is actively indexing or searching (storage charges still apply). The addition of the regional endpoint further simplifies multi-tenant workloads by routing traffic to any collection through a single hostname. Together, scale-to-zero compute and the regional endpoint make the collection-per-tenant model both economically viable and operationally straightforward.
The OpenSearch Serverless per-account endpoint
OpenSearch Serverless next generation introduces a per-account, regional endpoint that serves all collections through a single hostname:
https://<account-id>.aoss.<region>.on.aws
The x-amz-aoss-collection-name or x-amz-aoss-collection-id header identifies the target collection on each request. This means one connection pool, one TLS session, and one endpoint to manage regardless of how many collections you have.
From a client perspective, you create a single OpenSearch client pointed at the regional endpoint and route requests by setting a header:
Every subsequent request includes the routing header to target a specific collection:
This is a significant improvement over the classic architecture, where each collection had its own endpoint and you needed to manage separate connections for each.
Collection per tenant with query routing
The architecture is straightforward: one collection group holds all tenant collections, and the regional endpoint handles routing.
Create a collection group with scale-to-zero
When you set minIndexingCapacityInOCU and minSearchCapacityInOCU to 0, OpenSearch Serverless scales down your compute to 0 OpenSearch Compute Units (OCUs) when they are idle for 10 minutes. You pay only for the storage for your indices. If you want to maintain compute and avoid cold starts, set minIndexingCapacityInOCU or minSearchCapacityInOCU to a value greater than 0.
Create one collection per tenant
Each product category maps to its own collection within the group:
When choosing a collection name for your tenants, consider privacy, name length, and future ease of upgrading your application. You can use a hash function to map tenant identifiers to collection names.
Collection names are visible in API calls and logs. If your tenant ID contains personally identifiable information (PII), that information is also visible in logs. Hashing the tenant ID obfuscates the sensitive information.
OpenSearch Serverless has a 64-character limit on collection names. Your tenant ID can be longer than that. Hashing helps stay within this limit.
You might also want to add a prefix to collection names so that you can use wildcard patterns in access policies. For example, naming collections pqa-a1b2c3d4 lets you write a single data access policy matching collection/pqa-*. Including a version component in the name (such as pqa-v2-a1b2c3d4) makes it straightforward to create new collections during schema migrations without disrupting existing tenants.
Index data using the regional endpoint
A single OpenSearch client handles all collections. The x-amz-aoss-collection-name header routes each request to the correct collection:
Query a specific tenant’s data
Searching works the same way. Set the header to target the tenant’s collection:
The application layer maps a tenant ID (in this case, a product category) to a collection name, and the regional endpoint handles the rest. No connection pool management, no endpoint lookups, no per-tenant client instances.
Limitations
There are practical constraints to consider when adopting this pattern.
Cold start latency. When a collection group has scaled to zero compute, the first request takes approximately 10 seconds while capacity provisions. For latency-sensitive tenants, you can send a lightweight warmup query (such as a match_all with size=1) before production traffic arrives.
Collection group limits. There are account-level limits on the number of collections and collection groups. Check the Amazon OpenSearch Serverless quotas for current numbers if you are planning thousands of tenants.
Security policy size. Encryption, network, and data access policies list collection resource patterns. Because tenant count grows, these policy documents grow linearly. Use wildcard patterns to stay within OpenSearch Serverless policy size limits.
No cross-collection queries. Each search request targets exactly one collection. If you need to query across tenants for analytics or global search, you need an aggregation layer or a separate shared collection.
Conclusion
In this post, we showed how the next-generation OpenSearch Serverless architecture makes the collection-per-tenant model practical for multi-tenant search. Scale-to-zero reduces the minimum cost for inactive tenants, fitting the compute resources to the demands of tenants. The regional endpoint eliminates the operational complexity of managing per-tenant connections. You get full data isolation between tenants, independent scaling for each tenant’s workload, and a single endpoint to manage in your application code.
For more information, see the Amazon OpenSearch Serverless documentation.
About the author

