Inside Meesho’s Real-Time Analytics Platform: A Deep Dive into Driving Performance at Scale

At Meesho, we are constantly striving to understand our users better and enhance their experience. Every day, users can generate billions of events on our platform from browsing products to completing purchases and we aim to translate these signals into insights instantly. From monitoring critical business & tech metrics to understanding the root cause of issues, real-time analytics is the heartbeat of our product analysis and incident management.

In this post, we’ll take you through:

The problem we set out to solve
The design and challenges of our existing near real-time (NRT) analytics platform
The re-architecture journey for raw data analysis
And finally, the wins that are helping us scale analytics at Meesho

What is the Problem? Unlocking Instant Business Insights

For Meesho, real-time analytics isn’t a luxury—it’s a necessity. We needed a system that could:

Monitor metrics in real-time – Instantly detect drops in key metrics like conversions or engagement, not hours later.
Enable rapid root cause analysis – Quickly answer the “why” behind changes by slicing data across app versions, regions, devices, and more — along with funnel analysis and user journey exploration to pinpoint drop-offs and behavioural shifts.
Trigger near real-time alerts – Catch anomalies early — like order failures or post-release engagement drops—to reduce the impact.

All of this had to be powered by a dependable and cost-efficient system.

From Aggregates to Atoms: Rebuilding for Raw Data at Scale

Every robust analytics platform begins its journey somewhere. For Meesho, our initial foray into Near Real-Time (NRT) analytics was powered by Apache Druid . It was the natural choice then, supporting minute-level dimensional aggregates generated by Apache Flink.

At its inception, Druid aligned well with our needs:

SQL Join Support — Early support for SQL joins gave us flexible querying over aggregated data.
Strong Caching for Aggregates — Its per-segment and whole-query caching worked well for our pre-aggregated data, to keep results fresh for real-time use.
Active Community — A large, active community provided guidance and stability.

At the time, our event volume was modest due to the pre-aggregated design, and Druid served us well as a solid starting point.

When Ambition Outgrew the Architecture

However, as Meesho grew, so did our analytical needs. Our requirements shifted dramatically. We no longer wanted just minute-level aggregates; we needed granularity across all event attributes. We needed to drill down into raw event data, understand individual user actions and journeys, and uncover insights that pre-aggregated views simply couldn't provide.

This shift exposed the limitations of our existing system for these specific deep-dive requirements. What were once strengths began to show strain, surfacing challenges like Join Bottlenecks, Diminishing Caching Advantages, Limited Indexing, Higher Scan Costs, Lack of Data Co-location, and Limited Upsert Capabilities.

To truly unlock the power of real-time analytics on raw data, it became clear that a re-design was essential. We embarked on a re-architecture journey, recognising that while Druid remains a valuable tool for our specialised, high-performance dashboarding, we needed an alternate system to power ad-hoc, no-code analytics. This shift allows us to support a wider variety of sophisticated features — such as identity merging, deep insights, funnel analysis, and complex user journeys — that require more flexibility than a pre-aggregated model can provide.

The Search for a New Foundation: Contenders and Our Choice

Our existing analytics system was hitting its ceiling. To power Meesho’s next generation of real-time insights, we needed a new data store that could handle massive volume of events per day and deliver P90 latency under 2 seconds across minute-, hour-, and day-level granularities, over windows from a day to several months. Beyond speed, we required resilience, cost efficiency, and low operational overhead at large scale.

We evaluated few strong contenders:

StarRocks [Open Source]
ClickHouse [Open Source]
Apache Pinot
and other proprietary solutions.After benchmarking them all against our workloads — focusing on performance, resilience, maintenance needs, and cost - Apache Pinot emerged as the clear front-runner and became the foundation of our new real-time analytics platform.

Unpacking Apache Pinot

Apache Pinot stood out for us as a powerful solution, designed to meet the demanding requirements of our real-time analytical workloads. Its architecture and features provide ample advantages for us seeking to derive immediate value from the data.

Key Capabilities of Apache Pinot

Advanced Indexing for Speed: Pinot offered a rich set of indexes that accelerate queries and minimize data scans, drastically improving performance.
Intelligent Data Co-location: We had certain use cases where we had to join our OBT table to other tables for use cases like experiment analysis or cohort analysis or user identity merge. Pinot supported data co-location, a powerful feature that could help improve join performance. By strategically placing related data together, it reduced network traffic and computation needed for joined queries, leading to faster results.
Flexible Data Modification (Upsert Support): Pinot supported robust upserts, enabling partial updates to records—essential for keeping user profiles and ever changing dimensions accurate in real time.

These capabilities made Apache Pinot a strong fit for our use case requiring high-performance, scalable real-time analytics.

A Unified Vision: Re-architecture for Deeper Insights

With Pinot selected as our core, the re-architecture for raw data analysis coalesced around four major, interconnected components:

Ingestion - The robust pipeline for receiving data from diverse sources and efficiently sending it to our storage layer.
Storage - Our chosen solution for efficient, distributed, replicated, and lightning-fast data storage.
Query - The engine designed to accept complex customer queries and return aggregated, ready to visualize data swiftly. UI - The intuitive interface that allows users to interact seamlessly with our various data stores and extract insights for various use cases like data visualisation, alerting & reporting etc.

Storage – The Foundation

Building a real-time analytics platform requires a strong foundation—starting with data storage. We evaluated two strategies: separate tables per event vs. a One Big Table (OBT).

While separate tables offered clean separation, they introduced major downsides: complex joins, fragmented schemas, ingestion overhead, and poor query performance—especially when reconstructing a user journey across events.

OBT, on the other hand, proved far more effective. By storing all event types in a single denormalised table, we saw:

Faster queries – No joins, all data in one place
Simpler design – No need to manage co-location
High compression – Repetitive fields shrank well in columnar storage
Unified schema – Easier to evolve and queryLower maintenance – One table is easier to manage than many

Given the performance gains and simplicity, we chose OBT—a foundational decision that enabled the speed, scale, and agility we needed.

Ingestion: Building a Unified Data Ingestion Pipeline

With storage in place, the next step was building a reliable ingestion pipeline to handle the constant flow of user clickstream data. Much like laying down fuel lines to power an engine, we needed a system that could deliver only the most relevant events to our real-time analytics layer.

Since raw client-side data isn’t always analytics-ready, we prioritised selective ingestion and transformation. Our internal streaming platform helped us move quickly here. By enabling SQL-based transformations and integrating easily with Kafka and other connectors, it let us filter, shape, and route data with minimal effort—no custom code required. This accelerated onboarding for new events and ensured only meaningful data reached our real-time store, Pinot.

As for the streaming engine platform itself that’s a topic for another day.

Beyond Silos: Orchestrating Data Access

Prism Federator - Query Layer unifying diverse data sources

Imagine needing crucial information split between two places: a fast, real-time newsroom and a deep, archival library. Getting the full picture meant querying both separately. At Meesho, this was our reality — recent data lived in our real-time store, while historical context sat in the data lake.

Enter the Prism Query Federator: the intelligence layer we built to abstract these systems and offer a unified, seamless way to access insights.

Key Features:

Smart Source Selection: It automatically picks the right datastore — e.g., routes older data requests to the data lake if real-time stores don’t have it.
Attribute-Based Routing: Queries are directed based on available attributes—real-time store for fast access, or Trino for broader coverage.
Multi-Source Support: Federator connected various datasources be it open source / proprietary and can merge results across them for a unified view.
Caching & Load Control:
Rule-Based Caching: Speeds up frequent queries without hitting the backend.
Single-Flight Control: Prevents duplicate downstream queries during high load by collapsing identical concurrent requests.
One Interface for All Tools: It powers everything from dashboards to anomaly detection and GenAI, streamlining development and integration.

This federated setup delivers the goodness of both worlds — real-time speed where needed, and scalable access to historical data when depth matters.

Wins: A Shift in Real-time Analytics at Meesho

The re-architecture of our real-time analytics platform brings following wins for Meesho:

Immediate Product Understanding: Instant access to insights allows teams to analyze why metrics change, identify friction points, and validate the impact of new features.
Faster Root Cause Analysis: Unified data and efficient querying enable engineers and product managers to quickly drill down into user behavior, leading to faster identification and resolution of issues.
Proactive Incident Management: Near real-time alerting on insights allows us to detect anomalies early and minimize the impact of incidents.
Enhanced User Experience: By understanding user behavior at a granular level, we aim to continuously improve our product, leading to a more seamless and engaging experience for Meesho users.
Scalability and Cost Efficiency: The chosen architecture, particularly with Pinot's capabilities, provides a scalable and cost-efficient solution for handling the ever-growing volume of event data.
Designed for Adaptability: The Prism Federator ensures our analytics platform can evolve with future data store innovations—avoiding vendor lock-in and enabling us to adopt best-of-breed technologies as they emerge.

This journey to a more robust, real-time analytics platform is Meesho's commitment to data-driven decision-making, ensuring we continue to build and deliver seamless experiences for our users.

Shoutouts

Special thanks to Ankit Kalra, Manav Kodnani, Prakhar Pande, Soham Yadav, Akshay Raichur, Md Saquib Shakeel for working closely on the project and Sundaresan A Udayasankar, Ramiz Mehran and Alok Sharma for their continuous guidance.

Big thanks to the StarTree team for their continuous support in helping us manage and scale Apache Pinot effectively.