Introduction
At Meesho, every user is unique. They come from diverse corners of the country, each with distinct tastes, preferences, and needs. This diversity is both a challenge and an opportunity. To make every user’s journey comfortable and relevant, we leverage cutting-edge technology to deliver hyper-personalized experiences that resonate on an individual level.
The foundation of this personalization lies in our advanced recommendation systems and precision-driven ranking algorithms. Powered by state-of-the-art machine learning and deep learning models, these systems do more than suggest products—they craft tailored experiences that align with each user’s unique preferences, all while offering exceptional value.
However, as Meesho continued to grow, so did the demands on our technology. We realized that our existing ranking and relevance architecture had limitations that could hinder scalability and innovation. To address this, we undertook a complete overhaul—rebuilding the system from the ground up. This transformation wasn’t just an upgrade; it was a reinvention. The new architecture brings unparalleled flexibility and performance, enabling us to develop, test, and deploy features with unprecedented speed and efficiency. More importantly, it empowers us to better serve our users by delivering personalized experiences at scale like never before.
In this article, we’ll take you behind the scenes of this architectural evolution—unpacking its technical intricacies, exploring its role in delivering hyper-personalized experiences at scale, and highlighting the meaningful impact it creates for both our users and business operations.
Basics
What are Candidate Generators?
As the name implies, candidate generators (CG) take input such as feed details, search queries, and/or user context, and produce a set of candidate results as output. In our context, these candidates are product IDs. In practice, multiple candidate generator algorithms run in parallel, collectively aiming to maximize the recall of relevant candidates. A typical candidate generator involves performing an approximate nearest neighbor search on a set of product embeddings using the embedding of the user or the query. These embeddings—for queries and products—are derived by training two-tower network models, which position queries close to relevant products within the embedding space.
What are Rankers?
Rankers determine the final sorting order of candidates produced by various CGs. Their implementations can range from simple mathematical functions to sophisticated deep learning models. In e-commerce, a typical ML-based ranking model estimates the probability of a user action (e.g., click, order) for a target product by leveraging hundreds of features related to the user, the browsing context (e.g., search query, product page), and the candidate product. The final feed is created by sorting the candidate products in descending order based on their ranking scores. Often, the ranking process involves multiple stages.
When viewed holistically, candidate generators address the relevance problem, while ranking models solve the ranking problem. However, in practice, these two processes work in tandem to deliver the desired outcomes.
Initial Architecture
Requirements
- The system needs to support multiple CGs targeted to solve for short term intent and long term interest.
- The system must support dynamic management of candidate generators - allowing us to add new ones, remove existing ones, change their execution sequence, and combine their outputs using various strategies.
Architecture
Microservices view

The above figure shows a microservices interaction between them
- Orchestrator - The Orchestrator manages multiple CGs and rankers that need to be executed in different sequences based on the app real estate (e.g., For You, Recommendations) and tenant (e.g., Ads, Organic products). To handle these varying workflows, we implemented a custom DAG (Directed Acyclic Graph) executor where CGs, rankers, and merge logic are represented as nodes in a graph. This allows us to easily modify the execution flow through configuration. Since all operations work with candidates, we created a unified component structure that works seamlessly with the DAG executor. The Orchestrator microservice, built on top of this executor, exposes APIs and implements various nodes including CG connectors, ranker connectors, custom filters, and merge logic. The topology shown below illustrates one possible configuration among many that can be implemented based on business requirements.

- CG services - Represented in Green, these independent microservices handle distinct recommendation strategies - fetch products based on recent interactions (short term intent), long term user preferences, or add products in new categories that the user has not explored previously.
- Ranker - Represented in Blue, the Ranker microservice abstracts multiple ranking models and their associated features behind a single interface, handling scoring and ordering of all candidates.
Lessons Learned
Ranking serves as a pivotal growth driver for any marketplace, making it a focal point for extensive experimentation and iterations. New algorithms are frequently introduced, while older models are retired. At any given time, nearly a hundred ranking-related experiments are actively running across various surfaces of the Meesho app.
Over a year of refining and evolving our ranking system, we encountered several key challenges in effectively enabling rapid experimentation. Here, we share some of the key lessons we’ve learned along the way
CG Representation as Microservices
- We initially designed each CG as a separate microservice, anticipating diverse implementation needs. However, we discovered that CGs and rankers are highly experimental - rather than just evolving algorithms, entire CGs would often be replaced. Creating and deprecating microservices for each new CG proved expensive and time-consuming. As a temporary fix, we implemented new CGs within existing services. In retrospect, starting with a single unified CG service would have been more practical.
Orchestrator Implementation Limitations
- Node duplication - Our DAG executor framework couldn’t handle multiple instances of the same node with different configurations (nodes were identified only by name). This forced developers to create duplicate nodes with different names, leading to unnecessary boilerplate code.
- Rigid Execution Paths - The system lacked support for conditional execution in the topology. We couldn’t skip specific execution paths based on conditions - like avoiding feed regeneration when it is already available in cache.
- Poor Multi-tenancy Support - Poor multi-tenancy support made it difficult to share nodes across different real estates and tenants. This scattered implementation made maintenance challenging and complicated the removal of unused nodes.
Cross-team Development
- Difficulties in enabling multiple teams to work independently on different parts of the system.
Performance and Cost Issues
- Our Java SpringBoot-based orchestrators and CGs suffered from poor resource utilization, particularly for I/O-bound workloads, leading to suboptimal price-to-performance ratios due to blocking I/O operations.
- Microservices are built on HTTP/1.1 and requests are serialised with JSON which are not optimal for larger data transfer in today’s standard.
- Frequent tuning of application in-memory caches for better GC performance for varying workloads.
Reimagined Architecture
We have addressed all the above pain points in the reimagined architecture. Actually, if you see the evolution of Ranking systems in Meesho, it's a third-generation architecture — a journey that deserves another detailed blog, perhaps for another day. Let’s get into this generation architecture in detail.
Architecture
Microservices view

The microservices architecture became significantly simpler with the creation of a unified CG service. The figure above provides a simplified depiction of server interactions. For a more comprehensive understanding of how the orchestrator and CG were developed, refer to the detailed framework view presented below.
Framework view
The architecture follows a hierarchical layered structure where each layer serves a specific purpose while building upon the capabilities of the layers below. This layered approach ensures separation of concerns, maintainability, and scalability of the system. Let’s dive deep into each layer:
- The Foundation Layer serves as the foundational bedrock of the system, providing essential building blocks and utilities that any large-scale distributed system requires. This layer abstracts away the complexity of essential components, allowing higher layers to focus on their specific responsibilities.
- The Platform Layer acts as a crucial middleware that provides the frameworks and tools necessary for orchestration. It bridges the gap between foundational layer and business logic by offering reusable patterns, abstractions, and utilities that make it easier to build and maintain complex orchestration flows.
- The Orchestration Layer represents the highest level of abstraction, where business logic and execution flows are implemented.
Orchestrator (Inference Orchestrator Pipeline)

Foundation Layer
- go-core - Our foundational Golang library developed at Meesho that provides essential components including logging, metrics, server frameworks, middlewares, and database clients etc.
Platform Layer
- dag-engine (redesigned in-house DAG executor) - A sophisticated DAG execution framework built in-house using Golang, featuring enhanced topology management capabilities. It enables optional execution paths and supports multiple execution instances of the same node through configuration. The following configuration illustrates our implementation of the CG and ranker topology.

Above is the sample upgraded topology config. Few nuances as follows-
- cg-connector:1 - Represents our versioned component system where each instance is uniquely identified by name and version, eliminating the need for duplicate component definitions.
- eligibility_checker:1:on_success - Built on top of the optional execution component, the eligibility-checker evaluates business conditions and returns either success or failure states. Components cg1, cg2, and cg3 are configured to execute only when the eligibility-checker returns a success state
- orchestrator-component-starter - It acts as a core library for creating components(nodes in a DAG) which encapsulates component registry, context, errors, metrics etc.
- orchestrator-starter - Similarly, the orchestrator-starter serves as a core library for the orchestrator runner, handling essential functions like feed tracking, topology configuration management, experimentation and user segmentation for different real-estates.
Orchestration Layer
- orchestrator-component/<real-estate>-<tenant> - These modules house various node implementations that contain specific business logic, whether they're candidate generators or other processing units. Each node is registered in the component registry via the component starter. To promote code reuse across different real estates and tenants, we've developed shared modules (cross-RE and cross-tenant). Our experience showed that while the core runner logic remains relatively stable, these business logic nodes frequently evolve. This insight led us to architecturally separate these frequently changing components from the more stable runner infrastructure.
- orchestrator-runner - Built on orchestrator-starter lib, it provides API and server layers that manage topology configurations, resolve component implementations through the registry, and execute DAGs using the dag-engine.
- compile-time plugins - To maintain separation of concerns, we leverage Golang’s compile-time plugins for component registration. Components register themselves using init() functions that execute before main(). This automatic registration is triggered simply by adding side imports in main.go, functioning like package initialization.

CG

The CG framework mirrors the orchestrator architecture, with cg-algo replacing components as the primary building block. Unlike the component framework, it doesn’t require a DAG executor. The cg-algo directly utilizes various data source connectors provided by cg-algo-data-source to interface with our ML platform.
Operating principle
After careful considerations, we have enabled below operating principle
- Immutable component/CG-algo - All components and CG algorithms follow an immutability principle: once deployed, they remain unchanged except for bug fixes. When implementing new features or optimizations, we create new components rather than modifying existing ones. This decision was driven by our previous challenges with maintaining multiple code paths within single components and the complexity of removing deprecated logic. This approach has simplified our version management and improved code clarity.
- Controlled code duplication is acceptable - As a consequence of our immutability principle, some code duplication between components or CGs is expected and acceptable. However, we maintain efficiency by sharing common business logic through our cross-tenant and cross-RE modules.
Impact
We did all our traffic migration from older to newer systems and got ready for battle testing for Meesho’s Maha Diwali Sale (MDS).
Performance and Scaling Improvements
- Successfully handled peak loads of 215K QPS for the orchestrator and 190K QPS for CG services
- Significantly reduced performance spikes through Go's superior concurrency and memory management
- Consistently meeting SLA requirements across all traffic patterns
- Adoption of freecache, a zero GC overhead caching library significantly improved application in-memory caching with better performance and less overhead
Cost Efficient Infrastructure - Achieved 70% reduction in infrastructure costs through:
- Migrating services from Java SpringBoot + HTTP/1.1 to Golang + gRPC stack. This transformation leveraged Golang's superior concurrency model with lightweight goroutines (2KB vs Java's ~1MB threads), enabling efficient handling of millions of concurrent operations through CSP-based channel communication. The system benefits from efficient memory management with sub-millisecond GC pause times and immediate memory release through escape analysis, while maintaining a smaller memory footprint via stack allocation optimization. Performance is further enhanced by Golang's M:N scheduler for optimal goroutine-to-thread mapping, work-stealing algorithm for balanced load distribution, and integrated network poller, collectively resulting in improved system throughput and significant infrastructure savings
- Consolidation of multiple CG services into a unified service
Developer Velocity - Dramatically reduced implementation time for new features:
- New CG implementation now takes 2-3 days compared to previous 2-week sprint cycle
- Simplified component addition, removal, and modification
- Streamlined testing and deployment processes
- Enhanced developer experience through modular architecture
Distributed Development Capabilities - The new architecture has significantly improved cross-team development efficiency. The modular component structure and clear separation of concerns enables multiple teams to work independently. Teams can develop and test their components in isolation, while the shared registry and standardized interfaces ensure seamless integration. This has particularly benefited:
- Cross teams can develop new CGs or components without deep knowledge of the core system
- Multiple teams can work on different components simultaneously without dependencies
- Teams can easily share and reuse components across different real estates through the cross-RE and cross-tenant modules
- The immutable component principle and versioning strategy helps teams maintain clear ownership and reduce conflicts
Conclusion
This initiative has been an incredible learning experience for our team, and it’s just the beginning. Stay tuned for future updates as we continue to push boundaries and share our journey.